Preview only show first 10 pages with watermark. For full document please download

Designing Disaster Tolerant Ha Clusters Using Metrocluster And

   EMBED


Share

Transcript

Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters HP Part Number: B7660-90025 Published: September 2008 Legal Notices © Copyright 2008 Hewlett-Packard Development Company, L.P. Publication Date: 2008 Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Intel®, Itanium®, registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. Oracle ® is a registered trademark of Oracle Corporation. UNIX® is a registered trademark in the United States and other countries, licensed exclusively through The Open Group. Table of Contents Printing History ...........................................................................................................................19 Preface.......................................................................................................................................21 Guide to Disaster Tolerant Solutions Documentation........................................................22 1 Designing a Metropolitan Cluster...............................................................................................25 Designing a Disaster Tolerant Architecture for use with Metrocluster Products..............25 Single Data Center.........................................................................................................26 Two Data Centers and Third Location with Arbitrator(s)............................................26 Arbitrator Node Configuration Rules......................................................................29 Disk Array Data Replication Configuration Rules..................................................29 Calculating a Cluster Quorum.................................................................................30 Example Failover Scenarios with One Arbitrator....................................................30 Example Failover Scenarios with Two Arbitrators..................................................31 Worksheets..........................................................................................................................33 Disaster Tolerant Checklist............................................................................................33 Cluster Configuration Worksheet.................................................................................33 Package Configuration Worksheet................................................................................34 Next Steps...........................................................................................................................36 2 Designing a Continental Cluster.................................................................................................37 Understanding Continental Cluster Concepts...................................................................37 Mutual Recovery Configuration...................................................................................38 Application Recovery in a Continental Cluster............................................................39 Monitoring over a Network..........................................................................................40 Cluster Events................................................................................................................41 Interpreting the Significance of Cluster Events............................................................42 How Notifications Work...............................................................................................42 Alerts.............................................................................................................................43 Alarms...........................................................................................................................43 Creating Notifications for Failure Events.....................................................................44 Creating Notifications for Events that Indicate a Return of Service.............................44 Maintenance Mode for Recovery Groups.....................................................................44 Moving a Recovery Group into Maintenance Mode...............................................45 Moving a Recovery Group out of the Maintenance Mode......................................46 Performing Cluster Recovery........................................................................................46 Performing Recovery Group Rehearsal in Continentalclusters....................................47 Notes on Packages in a Continental Cluster.................................................................49 Startup and Switching Characteristics.....................................................................49 Table of Contents 3 Network Attributes..................................................................................................50 How Serviceguard commands work in a Continentalclusters.....................................50 Designing a Disaster Tolerant Architecture for use with Continentalclusters...................51 Mutual Recovery...........................................................................................................52 Serviceguard Clusters...................................................................................................52 Data Replication............................................................................................................52 Physical Data Replication using Special Environment files....................................54 Multiple Recovery Pairs in a Continental Cluster...................................................55 Highly Available Wide Area Networking.....................................................................56 Data Center Processes ..................................................................................................57 Continentalclusters Worksheets....................................................................................57 Data Center Worksheet ...........................................................................................57 Recovery Group Worksheet ....................................................................................58 Cluster Event Worksheet .........................................................................................58 Preparing the Clusters........................................................................................................59 Setting up and Testing Data Replication.......................................................................59 Configuring a Cluster without Recovery Packages......................................................60 Configuring a Cluster with Recovery Packages............................................................62 Configuring Recovery Groups with Rehearsal Packages.............................................64 Building the Continentalclusters Configuration................................................................65 Preparing Security Files................................................................................................65 Network Security Configuration Requirements......................................................67 Creating the Monitor Package.......................................................................................67 Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters........................................................................................................69 Configuring Shared Disk for the Maintenance Feature...........................................69 Configuring a Monitor Package for the Maintenance Feature................................70 Editing the Continentalclusters Configuration File......................................................71 Editing Section 1—Cluster Information...................................................................71 Editing Section 2 – Recovery Groups.......................................................................74 Editing Section 3—Monitoring Definitions.............................................................79 Selecting Notification Intervals................................................................................84 Checking and Applying the Continentalclusters Configuration..................................87 Starting the Continentalclusters Monitor Package........................................................88 Validating the Configuration........................................................................................89 Documenting the Recovery Procedure.........................................................................90 Reviewing the Recovery Procedure..............................................................................91 Testing the Continental Cluster..........................................................................................91 Testing Individual Packages..........................................................................................92 Testing Continentalclusters Operations........................................................................92 Switching to the Recovery Packages in Case of Disaster...................................................93 Receiving Notification...................................................................................................94 Verifying that Recovery is Needed................................................................................94 Using the Recovery Command to Switch All Packages................................................95 4 Table of Contents To Start the Failover Process.........................................................................................95 How the cmrecovercl Command Works..................................................................98 Forcing a Package to Start...................................................................................................98 Restoring Disaster Tolerance..............................................................................................99 Restore Clusters to their Original Roles........................................................................99 Primary Packages Remaining on the Surviving Cluster.............................................100 Primary Packages Remaining on the Surviving Cluster using cmswitchconcl......101 Newly Created Cluster Will Run Primary Packages...................................................104 Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups.........................................................................................................................105 Performing a Rehearsal Operation in your Environment................................................106 Maintaining a Continental Cluster...................................................................................107 Adding a Node to a Cluster or Removing a Node from a Cluster .............................108 Adding a Package to the Continental Cluster.............................................................108 Removing a Rehearsal Package from a Recovery Group............................................109 Modifying a Recovery Group with a new Rehearsal Package....................................109 Removing a Package from the Continental Cluster....................................................109 Changing Monitoring Definitions...............................................................................110 Checking the Status of Clusters, Nodes, and Packages...............................................110 Reviewing Messages and Log Files.............................................................................113 Deleting a Continental Cluster Configuration............................................................114 Renaming a Continental Cluster.................................................................................115 Checking Java File Versions.........................................................................................115 Next Steps....................................................................................................................115 Support for Oracle RAC Instances in a Continentalclusters Environment......................115 Configuring the Environment for Continentalclusters to Support Oracle RAC........117 Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration..............................................................................................................126 Initial Startup of Oracle RAC Instance in a Continentalclusters Environment..........126 Failover of Oracle RAC Instances to the Recovery Site...............................................127 Failback of Oracle RAC Instances After a Failover.....................................................130 Rehearsing Oracle RAC Databases in Continentalclusters.........................................131 3 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP....133 Files for Integrating XP Disk Arrays with Serviceguard Clusters....................................133 Overview of Continuous Access XP Concepts.................................................................134 PVOLs and SVOLs.......................................................................................................134 Device Groups and Fence Levels................................................................................135 Fence Level of NEVER.............................................................................................135 Fence Level of DATA...............................................................................................136 Fence Level of ASYNC ............................................................................................136 Continuous Access Link Timeout..........................................................................138 Consistency Group.................................................................................................138 Table of Contents 5 Limitations of Asynchronous Mode......................................................................138 Other Considerations on Asynchronous Mode.....................................................139 Continuous Access Journal Overview.........................................................................139 Journal Volume.......................................................................................................140 Pull-Based Replication...........................................................................................141 Mitigation of Network Problems...........................................................................141 Fence Level.............................................................................................................142 Journal Group.........................................................................................................142 Journal Cache, Journal Volumes, and Inflow Control...........................................142 Continuous Access Journal Pair State....................................................................143 Limitations of XP12000 Continuous Access Journal..............................................143 One-to-One Volume Copy Operations...................................................................143 One-to-One Journal Group Operations.................................................................144 Journal Group Requirement...................................................................................144 Configuring XP12000 Continuous Access Journal.................................................144 Registering Journal Volumes..................................................................................144 Data Replication Connections................................................................................145 Metrocluster package vs. Journal Group...............................................................145 Creating the Cluster..........................................................................................................145 Preparing the Cluster for Data Replication......................................................................146 Creating the RAID Manager Configuration................................................................146 Pair Creation of Journal Groups.............................................................................150 Creating Continuous Access Journal Pair..............................................................150 Sample Raid Manager Configuration File.............................................................150 Notes on the Raid Manager Configuration............................................................153 Configuring Automatic Raid Manager Startup.....................................................153 Defining Storage Units................................................................................................154 Creating and Exporting LVM Volume Groups using Continuous Access XP.......154 Creating VxVM Disk Groups using Continuous Access XP.................................155 Validating VxVM Disk Groups using Metrocluster/Continuous Access Data Replication..............................................................................................................156 Configuring Packages for Disaster Recovery...................................................................157 Completing and Running a Metrocluster Solution with Continuous Access XP............160 Maintaining a Cluster that uses Metrocluster with Continuous Access XP...............160 Viewing the Progress of Copy Operations.............................................................161 Viewing Side File Size............................................................................................161 Viewing the Continuous Access Journal Status.....................................................161 Viewing the Pair and Journal Group Information - Raid Manager using the “pairdisplay” Command..................................................................................161 Viewing the Journal Volumes Information - Raid Manager using the “raidvchkscan” Command................................................................................162 Normal Maintenance..............................................................................................163 Resynchronizing.....................................................................................................164 Using the pairresync Command............................................................................164 6 Table of Contents Failback..................................................................................................................165 Timing Considerations...........................................................................................165 Data maintenance with the failure of a Metrocluster Continuous Access XP Failover...................................................................................................................166 Swap Takeover Failure (Asynchronous/Journal mode).........................................166 Takeover Timeout (for Continuous Access Journal mode)....................................166 PVOL-PAIR with SVOL-PSUS(SSWS) State (for Continuous Access Journal Mode).....................................................................................................................167 XP Continuous Access Device Group Monitor...........................................................167 XP/Continuous Access Device Group Monitor Operation Overview...................167 Configuring the Monitor........................................................................................168 Configure the Monitor’s Variables in the Package Environment File....................168 Configure XP/Continuous Access Device Group Monitor as a Service of the Package...................................................................................................................170 Configuring the XP/Continuous Access Device Group Monitor as a Service in the Site Controller Package....................................................................................170 Troubleshooting the XP/Continuous Access Device Group Monitor....................171 Completing and Running a Continental Cluster Solution with Continuous Access XP..171 Setting up a Primary Package on the Primary Cluster................................................171 Setting up a Recovery Package on the Recovery Cluster............................................174 Setting up the Continental Cluster Configuration......................................................178 Switching to the Recovery Cluster in Case of Disaster...............................................179 Failback Scenarios........................................................................................................179 Scenario 1...............................................................................................................179 Scenario 2...............................................................................................................179 Failback in Scenarios 1 and 2.................................................................................180 Failback when the Primary has SMPL Status........................................................181 Maintaining the Continuous Access XP Data Replication Environment....................182 Resynchronizing.....................................................................................................182 Using the pairresync Command.......................................................................183 Some Further Points ..............................................................................................183 4 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA...185 Files for Integrating the EVA with Serviceguard Clusters...............................................185 Overview of EVA and Continuous Access EVA Concepts...............................................186 Metrocluster with EVA and Data Replication.............................................................187 DR Groups..............................................................................................................187 DR Group Properties........................................................................................188 Log Disk.................................................................................................................188 Copy Sets.....................................................................................................................189 Managed Sets...............................................................................................................189 Failover........................................................................................................................189 Continuous Access EVA Management Software.........................................................190 Table of Contents 7 Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA ................190 Setting up the Storage Hardware................................................................................190 Cluster Configuration..................................................................................................192 Management Server/SMI-S and DR Groups Configuration.......................................192 Defining Management Server and SMI-S Information ..............................................193 Creating the Management Server List...................................................................193 Creating the Management Server Mapping File....................................................195 Setting a Default Management Server...................................................................195 Displaying the List of Management Servers..........................................................195 Adding or Updating Management Server Information.........................................196 Deleting a Management Server..............................................................................197 Defining EVA Storage Cells and DR Groups..............................................................197 Creating the Storage Map File................................................................................199 Copying the Storage Map File................................................................................199 Displaying Information about Storage Devices.....................................................199 Verifying the EVA Configuration................................................................................200 Configuring Volume Groups.......................................................................................200 Identifying Special Device File Name for Vdisk in DR Group using Secure Path V3.0D or V3.0E.......................................................................................................200 Identifying Special Device Files using Secure Path v3.0F......................................202 Identifying Special Device Files for PVLinks Configuration.................................203 Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F................................................................................................................205 Configuring Volume Groups using PVLinks.........................................................206 Importing Volume Groups on Nodes at the Same Site..........................................207 Importing Volume Groups on Nodes at the Remote Site......................................208 Building a Metrocluster Solution with Continuous Access EVA.....................................209 Configuring Packages for Automatic Disaster Recovery............................................209 Maintaining a Cluster that Uses Metrocluster Continuous Access EVA....................213 Continuous Access EVA Link Suspend and Resume Modes................................213 Normal Maintenance..............................................................................................214 Failback..................................................................................................................214 Cluster Re-Configuration.......................................................................................214 Completing and Running a Continental Cluster Solution with Continuous Access EVA...................................................................................................................................215 Setting up a Primary Package on the Primary Cluster................................................215 Setting up a Recovery Package on the Recovery Cluster............................................218 Setting up the Continental Cluster Configuration......................................................221 Switching to the Recovery Cluster in Case of Disaster...............................................222 Failover to Recovery Site.............................................................................................223 Failover Scenarios........................................................................................................223 Scenario 1...............................................................................................................223 Failback to the Primary Site...................................................................................223 Scenario 2...............................................................................................................224 8 Table of Contents Failback to the Primary Site...................................................................................224 Scenario 3...............................................................................................................224 Failback in Scenario 3.............................................................................................224 Reconfiguring Recovery Group Site Identities in Continentalclusters after a Recovery.................................................................................................................224 5 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF...................227 Files for Integrating Serviceguard with EMC SRDF.........................................................227 Overview of EMC and SRDF Concepts............................................................................229 Preparing the Cluster for Data Replication......................................................................229 Installing the Necessary Software...............................................................................230 Building the Symmetrix CLI Database........................................................................230 Determining Symmetrix Device Names on Each Node..............................................231 Building a Metrocluster Solution with EMC SRDF..........................................................236 Setting up 1 by 1 Configurations.................................................................................236 Creating Symmetrix Device Groups......................................................................237 Configuring Gatekeeper Devices...........................................................................238 Verifying the EMC Symmetrix Configuration.......................................................239 Creating and Exporting Volume Groups...............................................................239 Importing Volume Groups on Other Nodes..........................................................240 Configuring PV Links............................................................................................240 Grouping the Symmetrix Devices at Each Data Center..............................................240 Setting up M by N Configurations..............................................................................242 Creating Symmetrix Device Groups...........................................................................242 Configuring Gatekeeper Devices................................................................................244 Creating the Consistency Groups................................................................................244 Creating Volume Groups.............................................................................................245 Creating VxVM Disk Groups using Metrocluster with EMC SRDF...........................247 Validating VxVM Disk Groups using Metrocluster with EMC SRDF........................248 Additional Examples of M by N Configurations........................................................249 Configuring Serviceguard Packages for Automatic Disaster Recovery......................250 Maintaining a Cluster that uses Metrocluster with EMC SRDF.................................254 Managing Business Continuity Volumes....................................................................255 Protecting against Rolling Disasters......................................................................255 Using the BCV in Resynchronization.....................................................................255 R1/R2 Swapping..........................................................................................................257 R1/R2 Swapping using Metrocluster SRDF...........................................................257 R1/R2 Swapping using Manual Procedures..........................................................257 Some Further Points..........................................................................................................258 Metrocluster with SRDF/Asynchronous Data Replication...............................................261 Overview of SRDF/Asynchronous Concepts..............................................................261 Requirements for using SRDF/Asynchronous in a Metrocluster Environment.........262 Hardware Requirements........................................................................................263 Table of Contents 9 Software Requirements..........................................................................................263 Preparing the Cluster for SRDF/Asynchronous Data Replication..............................263 Metrocluster SRDF Topology using SRDF/Asynchronous....................................263 Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous........................264 Building a Device Group for SRDF/Asynchronous....................................................264 Package Configuration using SRDF/Synchronous or SRDF/Asynchronous..............266 First-time installation of Metrocluster with EMC SRDF using SRDF/Synchronous................................................................................................266 Pre-existing Installations of Metrocluster SRDF using SRDF/Synchronous..........267 Migration of Existing Applications from SRDF/Synchronous to SRDF/Asynchronous..............................................................................................267 Package Failover using SRDF/Asynchronous.............................................................267 Protecting against a Rolling Disaster..........................................................................267 Limitations and Restrictions........................................................................................267 Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication....268 Overview of SRDF/Asynchronous MSC Concepts.....................................................268 Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Multi-Session Consistency (MSC) Data Replication..........................................................................270 Building a Composite Group for SRDF/Asynchronous MSC................................271 Configuring a Package using SRDF/Asynchronous MSC...........................................273 Initial installation of Metrocluster with EMC SRDF using SRDF/Synchronous....273 Metrocluster with EMC SRDF is already installed................................................273 Setting up the RDF Daemon........................................................................................273 Starting and Stopping the Daemon........................................................................274 Building a Continental Cluster Solution with EMC SRDF...............................................274 Setting up a Primary Package on the Primary Cluster................................................274 Setting up a Recovery Package on the Recovery Cluster............................................277 Setting up the Continental Cluster Configuration......................................................279 Switching to the Recovery Cluster in Case of Disaster...............................................280 Failback Scenarios........................................................................................................281 Scenario 1...............................................................................................................281 Scenario 2...............................................................................................................283 Maintaining the EMC SRDF Data Replication Environment......................................285 Normal Startup......................................................................................................285 Normal Maintenance..............................................................................................285 6 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture...........................287 Overview of Three Data Center Concepts........................................................................287 Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP..........................................................................................................................289 Overview of HP XP StorageWorks Three Data Center Architecture..........................291 XP 3DC Multi-Target Bi-Link Configuration.........................................................291 Three Data Center Multi-Hop Bi-Link Configuration...........................................293 10 Table of Contents HP StorageWorks Mirror Unit Descriptors............................................................294 Configuring an XP Three Data Center Solution..........................................................296 Creating the Serviceguard Clusters.......................................................................297 Creating the Continental Cluster...........................................................................297 HP StorageWorks RAID Manager Configuration.......................................................297 Creating the RAID Manager Configuration..........................................................297 Multi-Target Raid Manager Configuration............................................................301 Sample Raid Manager Configuration on a DC1 NodeA (multi-target bi-link)...............................................................................................................301 Sample Raid Manager Configuration on a DC2 NodeB (multi-target bi-link)...............................................................................................................302 Sample Raid Manager Configuration on a DC3 NodeC (multi-target bi-link)...............................................................................................................302 Multi-Hop Raid Manager Configuration...............................................................303 Sample Raid Manager Configuration on a DC1 NodeA (multi-hop-bi-link)....303 Sample Raid Manager Configuration on a DC2 NodeB (multi-hop-bi-link)....304 Sample Raid Manager Configuration on a DC3 NodeC (multi-hop-bi-link)....304 Alternative to HORCM_DEV.................................................................................305 11. Restart the Raid Manager instance so that the new information in the configuration file is read........................................................................................306 12. Repeat steps 2 through 11 on each host runs this particular application package...................................................................................................................306 Creating Device Group Pairs.................................................................................306 Identification of HP-UX device files.................................................................307 LVM Volume Groups Configuration.....................................................................308 VxVM Configuration.............................................................................................309 Package Configuration in a Three Data Center Environment....................................310 Timing Considerations.....................................................................................................314 Bandwidth for Continuous Access and Application Recovery Time...............................315 Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover...316 Swap Takeover Failure (for Continuous Access Sync Pair)........................................316 Takeover Timeout (for third data center)....................................................................317 Continuous Access-Journal Device Group PVOL-PAIR with SVOL-PSUS(SSWS) State.............................................................................................................................317 Failback Scenarios.............................................................................................................318 Failback from Data Center 3 (DC3).............................................................................318 MULTI-HOP-BI-LINK (DC1 > DC2 > DC3) Data Recovery from DC3 to DC1.....318 MULTI-TARGET-BI-LINK (DC2 > DC1 > DC3) Data Recovery from DC3 to DC1.........................................................................................................................321 Additional Reading...........................................................................................................322 7 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture..................323 Overview of Site Aware Disaster Tolerant Architecture..................................................323 Table of Contents 11 Components of SADTA...............................................................................................324 Site...............................................................................................................................324 Oracle Clusterware Sub-cluster...................................................................................325 Cluster File System Sub-cluster...................................................................................326 Complex Workload Packages......................................................................................326 Site Controller Package................................................................................................327 Site Safety Latch...........................................................................................................330 Overview of SADTA Configuration.................................................................................331 SADTA and Oracle Database 10gR2 RAC........................................................................332 Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture......................................................................................................................334 Summary of Required Procedures..............................................................................335 Checklist for Configuring SADTA.........................................................................335 Sample Configuration.................................................................................................337 Configuring SADTA....................................................................................................340 Setting up Replication.................................................................................................341 Configuring Metrocluster............................................................................................341 Creating a Serviceguard Cluster with Sites Configured........................................341 Configuring the Cluster File System Multi Node Package (SMNP)......................343 Installing and Configuring Oracle Cluster Ready Service (CRS)................................343 Configuring the Network.......................................................................................344 Configuring the Storage Device for Installing Oracle CRS....................................345 Setting Up CRS OCR and VOTING Directories ....................................................345 Installing and Configuring Oracle CRS.................................................................346 Configuring SGeRAC Toolkit Packages for the site CRS Sub-cluster....................347 Installing and Configuring Oracle Real Application Clusters (RAC).........................348 Creating the RAC Database ........................................................................................348 Setting up CFS File Systems for RAC Database Data Files ...................................349 Setting up CFS File Systems for RAC Database Flash Recovery...........................350 Creating the RAC Database using the Oracle Database Configuration Assistant.................................................................................................................352 Configuring and Testing RAC MNP Stack at the Local Disk Site ........................352 Halting the RAC Database on the Local Disk Site.................................................352 Creating Identical RAC Database at the Remote Site .................................................352 Configuring the Replica RAC Database.................................................................353 Configuring the RAC MNP Stack at the Target Disk Site .....................................354 Halting the RAC Database on the Target Disk Site................................................354 Configuring the Site Controller Package.....................................................................355 Configuring the Site Safety Latch Dependencies........................................................356 Starting the Disaster Tolerant RAC Database in the Metrocluster..............................358 Configuring Client Access for Oracle Database 10gR2 RAC......................................359 Configuring SGeRAC Cluster Interconnect Subnet Monitoring.................................360 Configuring and Administration Restrictions............................................................361 Understanding Site Failover in a Site Aware Disaster Tolerant Architecture..................361 12 Table of Contents Node Failure................................................................................................................362 Site Failure...................................................................................................................362 Site Failover ................................................................................................................362 Site Controller Package Failure...................................................................................364 Network Partitions Across Sites..................................................................................364 Disk Array and SAN Failure.......................................................................................365 Replication Link Failure..............................................................................................365 Oracle Database 10gR2 RAC Failure ..........................................................................365 Oracle Database 10gR2 RAC Instance Failure.............................................................366 Oracle Database 10gR2 RAC Oracle Clusterware Daemon Failure............................366 Administering the Site Aware Disaster Tolerant Metrocluster Environment..................366 Maintaining a Node.....................................................................................................367 Online Addition and Deletion of Nodes.....................................................................367 Adding Nodes Online on a Primary Site where the RAC Database is Running....368 Adding Nodes Online on a Remote Site where the RAC Database is Down........368 Deleting Nodes Online on the Primary Site where the RAC Database Package Stack is Running.....................................................................................................369 Deleting Nodes Online on the Site where the RAC Database Package Stack is Down......................................................................................................................369 Maintaining the Site.....................................................................................................370 Maintaining the Metrocluster Environment File.........................................................370 Maintaining Site Controller Package...........................................................................370 Starting a Disaster Tolerant Oracle Database 10gR2 RAC..........................................371 Shutting Down a Disaster Tolerant Oracle Database 10gR2 RAC..............................372 Halting and Restarting the RAC Database MNP Packages........................................372 Maintaining Oracle Database 10gR2 RAC MNP packages on a Site..........................373 Maintaining Oracle Database 10gR2 RAC..................................................................373 Moving a Site Aware Disaster Tolerant Oracle RAC Database to a Remote Site........374 Limitations of a Site Aware Disaster Tolerant Architecture.............................................374 Troubleshooting................................................................................................................375 Logs and Files..............................................................................................................375 Cleaning the Site to Restart the Site Controller Package.............................................376 Identifying and Cleaning RAC MNP Stack Packages that are Halted........................377 Understanding Site Controller Package Logs.............................................................377 Table of Contents 13 A Environment File Variables for Serviceguard Integration with Continuous Access XP.......................383 B Environment File Variables for Metrocluster Continuous Access EVA.............................................401 C Environment File Variables for Metrocluster with EMC SRDF........................................................405 D Configuration File Parameters for Continentalclusters..................................................................411 E Continentalclusters Command and Daemon Reference................................................................417 F Metrocluster Command Reference for Preview Utility...................................................................423 Overview of Data Replication Storage Failover Preview.................................................423 Command Reference.........................................................................................................424 Sample Output of the cmdrprev Command...................................................................425 G Data Replication Rehearsal in a Sample Environment................................................................427 Setup Environment...........................................................................................................427 Device Group Configuration Changes........................................................................427 Rehearsal Package Configuration................................................................................427 Primary Package Metrocluster Environment File.......................................................428 Continentalclusters Configuration..............................................................................428 Rehearsing Failure for a Single Instance Application......................................................428 H Site Aware Disaster Tolerant Architecture Configuration Work Sheet............................................431 Metrocluster Site Configuration.......................................................................................431 Replication Configuration................................................................................................432 CRS Sub-cluster Configuration – using CFS.....................................................................433 RAC Database Configuration...........................................................................................434 Site Controller Package Configuration.............................................................................436 Glossary...................................................................................................................................437 Index........................................................................................................................................445 14 Table of Contents List of Figures 1-1 1-2 1-3 1-4 1-5 1-6 1-7 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 3-1 3-2 3-3 3-4 3-5 4-1 4-2 4-3 4-4 5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9 5-10 5-11 Two Data Centers and Third Location with Arbitrators.............................................27 Failover Scenario with a Single Arbitrator..................................................................30 Failover Scenario with Two Arbitrators......................................................................32 Disaster Tolerant Checklist.........................................................................................33 Cluster Configuration Worksheet...............................................................................34 Package Configuration Worksheet..............................................................................35 Package Control Script Worksheet..............................................................................35 Sample Continentalclusters Configuration.................................................................38 Sample Mutual Recovery Configuration....................................................................39 Continental Cluster After Recovery............................................................................40 Multiple Recovery Pair Configuration in a Continental Cluster................................56 Sample Local Cluster Configuration ..........................................................................62 Sample Cluster Configuration with Recovery Packages............................................64 Sample Continentalclusters Recovery Groups...........................................................75 Sample Bi-directional Recovery Groups.....................................................................76 Continentalclusters Configuration Files.....................................................................88 Recovery Checklist......................................................................................................91 Oracle RAC Instances in a Continentalclusters Environment..................................116 Sample Oracle RAC Instances in a Continentalclusters Environment After Failover......................................................................................................................117 Continentalclusters Configuration Files in a Recovery Pair with RAC Support......125 XP Series Primary and Secondary Volume Definitions............................................135 XP Series Disk Array Side File..................................................................................137 Journal Based Replication.........................................................................................140 Disaster Tolerant Cluster...........................................................................................149 Q-Marker and Q-CNT...............................................................................................163 Configuration of Virtual Disks and DR groups........................................................191 EVA Configuration Checklist....................................................................................200 EVA Command View for the WWN Identifier.........................................................201 EVA Command View DR Group Properties.............................................................209 EMC R1 and R2 Definitions......................................................................................229 Sample syminq Output from a Node on the R1 Side...............................................231 Sample syminq Output from a Node on the R2 Side...............................................232 Parsing the Symmetrix Serial Number......................................................................232 Sample symrdf list Output from R1 Side.............................................................234 Sample symrdf list Output from R2 Side.............................................................235 Mapping HP-UX Device File Names to Symmetrix Units........................................237 2 X 2 Node and Data Center Configuration with Consistency Groups....................241 Devices and Symmetrix Units in M by N Configurations........................................242 2 by 1 Configuration..................................................................................................249 Bidirectional 2 by 2 Configuration............................................................................250 15 5-12 5-13 5-14 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 7-1 7-2 7-3 16 SRDF/Asynchronous Basic Functionality.................................................................262 Metrocluster Topology using SDRF/Asynchronous.................................................264 Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication.................................................................................................................270 Three Data Center Solution Overview......................................................................288 Three Data Center Architecture................................................................................291 XP Three Data Center Multi-Target Bi-Link Configuration Data Replication..........292 3DC Multi-Hop Bi-Link Configuration Data Replication.........................................293 Mirror Unit Descriptors.............................................................................................295 Mirror Unit Descriptor Usage...................................................................................296 Multi-Target Bi-Link (1:2)..........................................................................................301 Multi-Hop Bi-Link (1:1:1)..........................................................................................303 Complex Workload with Package Dependencies Configured..................................327 Package View.............................................................................................................332 Sample Configuration...............................................................................................338 List of Figures List of Tables 1 2 1-1 1-2 1-3 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 3-1 4-1 4-2 5-1 5-2 5-3 7-1 7-2 7-3 7-4 A-1 A-2 A-3 A-4 A-5 A-6 A-7 F-1 H-1 H-2 H-3 H-4 H-5 Editions and Releases..................................................................................................19 Disaster Tolerant Solutions Document Road Map......................................................22 Supported System and Data Center Combinations....................................................28 Node Failure Scenarios with One Arbitrator..............................................................31 Node Failure Scenarios with Two Arbitrators............................................................32 Monitored States and Possible Causes........................................................................41 Impact of Maintenance Mode.....................................................................................45 Serviceguard and Continentalclusters Commands....................................................51 Data Replication and Continentalclusters..................................................................53 Continentalclusters Data Replication Package Structure...........................................60 Status of Continentalclusters Packages Before Recovery............................................89 Status of Continentalclusters Packages After Recovery.............................................97 Supported Continentalclusters and RAC Configuration..........................................118 Metrocluster/Continuous Access Template Files......................................................133 Metrocluster Continuous Access EVA Template Files..............................................185 Individual Management Server Information............................................................196 Metrocluster with EMC SRDF Template Files..........................................................228 Mapping for a 4 Node Cluster connected to 2 Symmetrix Arrays...........................235 RETRY and RETRYTIME Values...............................................................................276 Packages Monitored by the Site Controller Package.................................................328 CRS Sub-clusters configuration in the Metrocluster.................................................338 Sample database configuration.................................................................................339 Error Messages and their Resolution........................................................................378 AUTO_FENCEDATA_SPLIT....................................................................................395 AUTO_NONCURDATA...........................................................................................396 AUTO_PSUEPSUS....................................................................................................396 AUTO_PSUSSSSWS..................................................................................................397 AUTO_SVOLPFUS....................................................................................................397 AUTO_SVOLPSUE....................................................................................................397 AUTO_SVOLPSUS....................................................................................................398 Command Exit Value and its Description.................................................................424 Site Configuration.....................................................................................................431 Replication Configuration.........................................................................................432 Configuring a CRS Sub-cluster using CFS................................................................433 RAC Database Configuration....................................................................................434 Site Controller Package Configuration......................................................................436 17 18 Printing History Table 1 Editions and Releases Printing Date Part Number Edition Operating System Releases (see Note below) December 2006 B7660-90019 Edition 1 HP-UX 11i v1 and 11i v2 September 2007 B7660-90021 Edition 2 HP-UX 11i v1, 11i v2 and 11i v3 December 2007 B7660-90023 Edition 3 HP-UX 11i v1, 11i v2 and 11i v3 September 2008 B7660–90025 Edition 4 HP-UX 11i v1, 11i v2 and 11i v3 The printing date and part number indicate the current edition. The printing date changes when a new edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The part number changes when extensive technical changes are incorporated. New editions of this manual will incorporate all material updated since the previous edition. NOTE: This document describes a group of separate software products that are released independently of one another. Not all products described in this document are necessarily supported on all the same operating system releases. Consult your product’s Release Notes for information about supported platforms. HP Printing Division: ESS Software Division Hewlett-Packard Co. 19111 Pruneridge Ave. Cupertino, CA 95014 19 20 Preface The following two guides describe disaster tolerant clusters solutions using Serviceguard, Metrocluster Continuous Access XP, Metrocluster Continuous Access EVA, Metrocluster EMC SRDF, and Continentalclusters: • • Understanding and Designing Serviceguard Disaster Tolerant Architectures Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters The Understanding and Designing Serviceguard Disaster Tolerant Architectures guide provides an overview of Hewlett-Packard Disaster Tolerant high availability cluster technologies and how to configure an extended distance cluster using Serviceguard. It is assumed you are already familiar with Serviceguard high availability concepts and configurations. The contents are as follows: • Chapter 1, Disaster Tolerance and Recovery in a Serviceguard Cluster, is an overview of disaster tolerant cluster configurations. • Chapter 2, Building an Extended Distance Cluster Using Serviceguard, shows the creation of disaster tolerant cluster solutions using extended distance Serviceguard cluster configurations. The Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters guide provides detailed, task-oriented documentation on how to configure, manage, and set up disaster tolerant clusters using Metrocluster Continuous Access XP, Metrocluster Continuous Access EVA, Metrocluster EMC SRDF, Continentalclusters, and Three Data Center Architecture. The contents are as follows: • • • • • Chapter 1, Designing a Metropolitan Cluster, shows the creation of disaster tolerant cluster solutions using the metropolitan cluster products. Chapter 2, Designing a Continental Cluster, shows the creation of disaster tolerant solutions using the Continentalclusters product. Chapter 3, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP, shows how to integrate physical data replication via Continuous Access XP with metropolitan and continental clusters. Chapter 4, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA, shows how to integrate physical data replication via Continuous Access EVA with metropolitan and continental clusters. Chapter 5, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF, shows how to integrate physical data replication via EMC Symmetrix disk arrays. Also, it shows the configuration of a special continental cluster that uses more than two disk arrays. 21 • • Chapter 6, Designing a Disaster Tolerant Solution Using the Three Data Center Architecture, shows how to integrate synchronous replication (for data consistency) and Continuous Access journaling (for long-distance replication) using Serviceguard, Metrocluster Continuous Access XP, Continentalclusters and HP StorageWorks XP 3DC Data Replication Architecture. Chapter 7, Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture, describes how to configure a site aware disaster tolerant architecture in a Metrocluster with Oracle Database 10gR2 RAC. A set of appendixes and a glossary provide additional reference information. Table 2 outlines the types of disaster tolerant solutions and their related documentation. Guide to Disaster Tolerant Solutions Documentation Use the following table as a guide for locating specific Disaster Tolerant Solutions documentation: Table 2 Disaster Tolerant Solutions Document Road Map To Set up Read Extended Understanding and Designing Serviceguard Disaster Tolerant Architectures Distance Cluster • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster for • Chapter 2: Building an Extended Distance Cluster Using Serviceguard Serviceguard/Serviceguard Extension for RAC Metrocluster with Continuous Access XP Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 3: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Metrocluster with Continuous Access EVA Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 4: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Metrocluster Understanding and Designing Serviceguard Disaster Tolerant Architectures with EMC SRDF • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 5: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 22 Table 2 Disaster Tolerant Solutions Document Road Map (continued) To Set up Read Continental Cluster Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster Continental Cluster using Continuous Access XP data replication Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Continental Cluster using Continuous Access EVA data replication Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster • Chapter 3: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster • Chapter 4: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Continental Understanding and Designing Serviceguard Disaster Tolerant Architectures Cluster using • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster EMC SRDF data Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters replication • Chapter 2: Designing a Continental Cluster • Chapter 5: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Continental Cluster using other data replication Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Continental Cluster Three Data Center Architecture Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 2: Designing a Continental Cluster • Chapter 6: Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Online versions of these document and other HA documentation are available at: http://www.docs.hp.com -> High Availability. Guide to Disaster Tolerant Solutions Documentation 23 Related Publications The following documents contain additional useful information: • • • • • • Clusters for High Availability: a Primer of HP Solutions, Second Edition. Hewlett-Packard Professional Books: Prentice Hall PTR, 2001 (ISBN 0-13-089355-2) Managing Serviceguard Fourteenth Edition Understanding and Designing Serviceguard Disaster Tolerant Architectures (B7660-90020) Using Serviceguard Extension for RAC (T1859-90038) Using High Availability Monitors (B5736-90025) Using the Event Monitoring Service (B7612-90015) When using VxVM storage with Serviceguard, refer to the following: • VERITAS Volume Manager Administrator’s Guide. This contains a glossary of VERITAS terminology. • VERITAS Volume Manager Storage Administrator Administrator’s Guide • VERITAS Volume Manager Reference Guide • VERITAS Volume Manager Migration Guide • VERITAS Volume Manager for HP-UX Release Notes Use the following URL to access HP’s High Availability web page: • http://www.hp.com/go/ha Use the following URL for access to a wide variety of HP-UX documentation: • http://docs.hp.com/hpux Problem Reporting If you have problems with HP software or hardware products, please contact your HP support representative. 24 1 Designing a Metropolitan Cluster This chapter describes the configuration and management of a basic metropolitan cluster through the following topics: • • • • • • • Designing a Disaster Tolerant Architecture for use with Metrocluster Products Single Data Center Two Data Centers and Third Location with Arbitrator(s) Package Configuration Worksheet Disaster Tolerant Checklist Cluster Configuration Worksheet Next Steps In addition, this chapter outlines the general characteristics of the metropolitan cluster solutions that are provided with use of the following products: • • • Metrocluster with Continuous Access XP Metrocluster with Continuous Access EVA Metrocluster with EMC SRDF A separate chapter details the configuration process for each storage solution. For additional information, refer to the Release Notes for your metropolitan cluster product and the documentation for your storage solution. Designing a Disaster Tolerant Architecture for use with Metrocluster Products Metrocluster is designed for use in a metropolitan cluster or metropolitan cluster environment within the 100 km distance limit. All nodes must be members of a single Serviceguard cluster. Two configurations are supported: • • A single data center without arbitrators (not disaster tolerant.) Two data centers and a third location architecture with one or two arbitrator systems or a quorum server system. See Figure 1-1 (page 27). Specifically for disaster tolerance, Serviceguard clusters or data centers can also be configured on different subnets. Such configurations provide improved scalability as operators can configure more number of nodes with more IP addresses. Following are the guidelines that must be followed to configure a Serviceguard cluster across network subnets: • All the nodes in the cluster must belong to the same domain. • The latency period in the heartbeat network that is configured across subnets must be less than 200 milliseconds. • A minimum of two heartbeat subnets must be configured for all cluster nodes. Designing a Disaster Tolerant Architecture for use with Metrocluster Products 25 • • • Each heartbeat subnet on a node must be routed using a different physical route to the other heartbeat subnet on the other node. Redundant physical networks need to be cabled separately between sites to maintain high availability. Each subnet that is used by a package must be configured with a standby interface in the local bridged network. For more information on configuring cross subnet clusters, see the Managing Serviceguard manual available at http://www.docs.hp.com. Following are the disaster tolerant architecture requirements: • • • In the disaster tolerant cluster architecture, it is expected that each data center is self-contained such that the loss of one data center does not cause the entire cluster to fail. It is important that all single points of failure (SPOF) be eliminated so that surviving systems continue to run in the event that one or more systems fail. It is also expected that the networks between the data centers are redundant and routed in such a way that the loss of any one data center does not cause the network between surviving data centers to fail. Exclusive volume group activation must be used for all Volume Groups (VG) associated with packages that use the disk arrays in the Metrocluster environment. The design of the Metrocluster Continuous Access script assumes that only one system in the cluster will have a VG activated at any time. Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323). Single Data Center A single data center architecture is supported, but it is not a true disaster tolerant architecture. If the entire data center fails, there will be no automated failover. This architecture is only valid for protecting data through data replication, and for protecting against multiple node failures. Two Data Centers and Third Location with Arbitrator(s) This is the recommended and supported disaster tolerant architecture for use with Metropolitan cluster. This architecture consists of two main data centers with an equal number of nodes and a third location with one or more arbitrator nodes or a quorum server node. Figure 1-1. 26 Designing a Metropolitan Cluster Figure 1-1 Two Data Centers and Third Location with Arbitrators replicated data for package A PVOL PVOL SVOL SVOL Power Circuit 1 Power Circuit 2 replicated data for package C replicated data for package D Local XP Disk Array A pkg A PVlinks C1 Remote XP Power node 1a Circuit Disk Array Continuous Access link 3 PVlinks pkg C C network A1 Highly Available node 1 node 2 pkg B SVOL SVOL PVOL PVOL replicated data for package B D1 PVlinks network Network network B1 network B PVlinks D Continuous Access link Data Center A node 2a pkg D Power Circuit 4 Data Center B network network arbitrator 1 arbitrator 2 or Quorum Server Third Location Power Circuit 5 Power Circuit 6 A disk array can be the main disk array for one set of packages and the remote disk array for another. In Figure 1-1, the XP disk array in data center A is the main or primary disk array for packages A and B, and the remote or secondary disk array for packages C and D in data center B. For packages A and B, data is written to PVOLs on the array in Data Center A and replicated to SVOLs on the array in Data Center B. Likewise the XP disk array in Data Center B is the primary or main disk array for packages C and D, and the secondary or remote for packages A and B. For packages C and D, data is written to PVOLs on the disk array in Data Center B and replicated to SVOLs in Data Center A. Arbitrators provide functionality like that of the cluster lock disk, and act as tie-breakers for a cluster quorum in case all of the nodes in one data center go down at the same time. Cluster lock devices are not supported because cluster locks cannot be maintained across the replication link, such as Continuous Access or SRDF. Arbitrators are fully functioning systems that are members of the cluster, and are not usually physically connected to the disk arrays. A Quorum Server is an alternative Designing a Disaster Tolerant Architecture for use with Metrocluster Products 27 form of cluster arbitration that uses a server program to determine cluster membership rather than a cluster lock disk or a Serviceguard Arbitration Node. Table 1-1 lists the allowable number of nodes at each main data center and the third location, up to a 16-node maximum cluster size. Table 1-1 Supported System and Data Center Combinations Data Center A Data Center B Data Center C Serviceguard Version 1 1 1 Arbitrator Node A.11.13 or later 1 1 Quorum Server System A.11.13 or later 2 1 2 Arbitrator Nodes A.11.13 or later 1 2 2 Arbitrator Nodes A.11.13 or later 2 2 1 Arbitrator Node A.11.13 or later 2 2 2* Arbitrator Nodes A. 11.13 or later 2 2 Quorum Server System A. 11.13 or later 3 3 1 Arbitrator Node A. 11.13 or later 3 3 2* Arbitrator Nodes A. 11.13 or later 3 3 Quorum Server System A.11.13 or later 4 4 1 Arbitrator Node A.11.13 or later 4 4 2* Arbitrator Nodes A.11.13 or later 4 4 Quorum Server System A.11.13 or later 5 5 1 Arbitrator Node A.11.13 or later 5 5 2* Arbitrator Nodes A.11.13 or later 5 5 Quorum Server System A.11.13 or later 6 6 1 Arbitrator Node A.11.13 or later 6 6 2* Arbitrator Nodes A.11.13 or later 6 6 Quorum Server System A.11.13 or later 7 7 1 Arbitrator Node A.11.13 or later 7 7 2* Arbitrator Nodes A.11.13 or later 7 7 Quorum Server System A.11.13 or later 8 8 Quorum Server System A.11.13 or later * Configurations with two arbitrators are preferred because they provide a greater degree of availability, especially in cases when a node is down due to a failure or 28 Designing a Metropolitan Cluster planned maintenance. It is highly recommended that two arbitrators be configured in Data Center C to allow for planned downtime in Data Centers A and B. The following is a list of recommended arbitration methods for Metrocluster solutions in order of preference: • • • • 2 arbitrator nodes, where supported 1 arbitrator node, where supported Quorum Server with APA Quorum Server For more information on Quorum Server, refer to the Serviceguard Quorum Server Release Notes for HP-UX. NOTE: In the metropolitan environment, the same number of systems must be present in each of the two data centers (Data Center A and Data Center B) whose systems are connected to the XP disk arrays. There must be either one or two arbitrators or a Quorum Server in a third location. Arbitrator Node Configuration Rules Although you can use one arbitrator, having two arbitrators provides greater flexibility in taking systems down for planned outages as well as providing better protection against multiple points of failure. Using two arbitrators: • • • Provides local failover capability to applications running on the arbitrator. Protects against multiple points of failure (MPOF). Provides for planned downtime on a single system anywhere in the cluster. If you use a single arbitrator system, special procedures must be followed during planned downtime to remain protected. Systems must be taken down in pairs, one from each of the data centers, so that the Serviceguard quorum is maintained after a node failure. If the arbitrator itself must be taken down, disaster recovery capability is at risk if one of the other systems fails. Arbitrator systems can be used to perform important and useful work such as: • • • • Hosting mission-critical applications not protected by disaster recovery software Running monitoring and management tools such as IT/Operations or Network Node Manager Running backup applications such as Omniback Acting as application servers Disk Array Data Replication Configuration Rules Each disk array must be configured with redundant links for data replication. To prevent a single point of failure (SPOF), there must be at least two physical boards in each disk array for the data replication links. Each board usually has multiple ports. Designing a Disaster Tolerant Architecture for use with Metrocluster Products 29 However, a redundant data replication link must be connected to a port on a different physical board from the board that has the primary data replication link. For Continuous Access XP, when using bi-directional configurations, where data center A backs up data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which to allow failback. Calculating a Cluster Quorum When a cluster initially forms, all systems must be available to form the cluster (100% Quorum requirement). A quorum is dynamic and is recomputed after each system failure. For instance, if you start out with an 8-node cluster and two systems fail, that leaves 6 out 8 surviving nodes, or a 75% quorum. The cluster size is reset to 6 nodes. If two more nodes fail, leaving 4 out of 6, quorum is 67%. Each time a cluster forms, there must be more than 50% quorum to reform the cluster. With Serviceguard a cluster lock disk or Quorum Server is used as the tie-breaker when quorum is exactly 50%. However, with a Metrocluster configuration, a Quorum Server is supported and a cluster lock disk is not supported. Therefore, a quorum of 50% will require access to a Quorum Server, otherwise all nodes will halt. Example Failover Scenarios with One Arbitrator Taking a node off-line for planned maintenance is treated the same as a node failure in these scenarios. Study these scenarios to make sure you do not put your cluster at risk during planned maintenance. Figure 1-2 Failover Scenario with a Single Arbitrator node 1 pkg A node 2 pkg B node 3 A C C1 A1 D1 Continuous Access links B Data Center A B1 D arbitrator 1 pkg C node 4 pkg D Data Center B Third Location The scenarios in Table 1-2, based on Figure 1-2, illustrate possible results if one or more nodes fail in a configuration with a single arbitrator. 30 Designing a Metropolitan Cluster Table 1-2 Node Failure Scenarios with One Arbitrator Failure Quorum Result arbitrator 1 4 of 5 (80%) no change node 1 4 of 5 (80%) pkg A switches node 1, then node 2 3 of 4 (75%) pkg A and B switch node 1, 2, then arbitrator 1 2 of 3 (67%) pkg A and B switch nodes 1, 2, arbitrator 1, then node 3 1 of 2 (50%) cluster halts* arbitrator 1, then node 1 3 of 4 (75%) pkg A switches data center A (nodes 1 and 2) 3 of 5 (60%) pkg A and B switch to data center B data center A, then arbitrator 1 2 of 3 (67%) pkg A and B switch, then no change data center A and arbitrator 1 2 of 5 (40%) cluster halts* data center A, then arbitrator 1, then node 3 1 of 2 (50%) cluster halts* arbitrator 1, then data center A 2 of 4 (50%) cluster halts* node 3, then data center A 2 of 4 (50%) cluster halts* data center B 3 of 5 (60%) pkg C and D switch to data center A third location 4 of 5 (80%) no change * Cluster can be manually started with the remaining node. With a single arbitrator node, the cluster is at risk each time a node fails or comes down for planned maintenance. Example Failover Scenarios with Two Arbitrators Having two arbitrator nodes adds extra protection during node failures and allows you to do planned maintenance on arbitrator nodes without losing the cluster should a disaster occur. Designing a Disaster Tolerant Architecture for use with Metrocluster Products 31 Figure 1-3 Failover Scenario with Two Arbitrators node 1 pkg A node 2 pkg B node 3 A C C1 A1 Continuous Access links D1 B B1 D Data Center Aarbitrator 1 pkg C node 4 pkg D Data Center B arbitrator 2 Third Location The scenarios in Table 1-3 illustrate possible results if a data center or one or more nodes fail in a configuration with two arbitrators. Note that 3 of the 4 scenarios that caused a cluster halt with a single arbitrator, do not cause a cluster halt with two arbitrators. Table 1-3 Node Failure Scenarios with Two Arbitrators Failure Quorum Result data center A (nodes 1 and 2) 4 of 6 (67%) pkg A and B switch to data center B data center A, then arbitrator 1 3 of 4 (75%) pkg A and B switch, then no change data center A and arbitrator 1 3 of 6 (50%) cluster halts* data center A, then arbitrator 1, then node 3 2 of 3 (67%) pkg A, B, and C switch arbitrator 1, then data center A 3 of 5 (60%) pkg A and B switch to data center B node 3, then data center A 3 of 5 (60%) pkg A and B switch to data center B data center B 4 of 6 (67%) pkg C and D switch to data center A third location 4 of 6 (67%) no change * Cluster can be manually started with the remaining node. 32 Designing a Metropolitan Cluster Worksheets Disaster Tolerant Checklist Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for a two main data centers and a third location configuration. Figure 1-4 Disaster Tolerant Checklist Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails. Arbitrator nodes or Quorum Server nodes are located in a separate location from either of the primary data centers (A or B). The elements in each data center including nodes, disks, network components, and climate control are on separate power circuits. Each node is configured with PV links. Each disk array is configured with redundant replication links, each of which is connected to a different LCP or RCP card or Controller. At least two networks are configured to function as the cluster heartbeat. All redundant cabling for network, heartbeat, and replication links are routed using different physical paths. Cluster Configuration Worksheet Use this cluster configuration worksheet either in place of, or in addition to the worksheet provided in the Managing Serviceguard user’s guide. If you have already completed a Serviceguard cluster configuration worksheet, you only need to complete the first part of this worksheet. Worksheets 33 Figure 1-5 Cluster Configuration Worksheet Name and Nodes Cluster Name: Data Center A Name and Location: Node Names: Data Center B Name and Location: Node Names: Arbitrator/Quorum Server Third Location Name & Location: Arbitrator Node/Quorum Server Names: Maximum Configured Packages: Subnets Heartbeat IP Addresses: Non-Heartbeat IP Addresses: Timing Parameters Heartbeat Interval: Node Time-out: Network Polling Interval: AutoStart Delay: Package Configuration Worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the Managing Serviceguard user’s guide. If you have already completed an Serviceguard package configuration worksheet, you only need to complete the first part of this worksheet. 34 Designing a Metropolitan Cluster Figure 1-6 Package Configuration Worksheet Package Configuration File Data Package Name: Primary Node: First Failover Node: Second Failover Node: Third Failover Node: Fourth Failover Node: Package Run Script: Package Halt Script: Maximum Configured Packages: Data Center: Data Center: Data Center: Data Center: Data Center: Timeout: Timeout: XP Series Volume Configuration Device Group Device Name Port # Target ID LUN EMC SRDF Series Volume Configuration Device Group Device Name Port # Figure 1-7 Package Control Script Worksheet Package Control Script Data VG[0]: LV[0]: FS[0]: FS_MOUNT_OPT[0]: VG[1]: LV[1]: FS[1]: FS_MOUNT_OPT[1]: VG[2]: LV[2]: FS[2]: FS_MOUNT_OPT[2]: VXVM_DG[0]: LV[0]: FS[0]: FS_MOUNT_OPT[0]: VXVM_DG[1]: LV[1]: FS[1]: FS_MOUNT_OPT[1]: VXVM_DG[2]: LV[2]: FS[2]: FS_MOUNT_OPT[2]: IP[0]: SUBNET[0]: IP[1]: SUBNET[1]: X.25 Resource Name: Service Name: Run Command: Service Fail Fast Enabled?: Service Name: Run Command: Retries: Service Halt Timeout: Retries: Service Fail Fast Enabled?: Service Halt Timeout: RetryTime: RetryTime: Worksheets 35 Next Steps To implement the metropolitan cluster design, use the procedures in the following sections below: • • • 36 “Completing and Running a Continental Cluster Solution with Continuous Access XP” (page 171) “Building a Continental Cluster Solution with EMC SRDF” (page 274) “Building a Metrocluster Solution with Continuous Access EVA” (page 209) Designing a Metropolitan Cluster 2 Designing a Continental Cluster Unlike metropolitan and campus clusters, which have a single-cluster architecture, a continental cluster uses multiple Serviceguard clusters to provide application recovery over local or wide area network (LAN and WAN). Using the Continentalclusters product, two independently functioning clusters are set up in such a way that in the event of a disaster, one cluster can take over the critical operations formerly carried out by the other cluster Disaster tolerance is obtained by eliminating the cluster itself as a single point of failure. This chapter describes the configuration and management of a basic continental cluster through the following topics: • • • • • • • • • • Understanding Continental Cluster Concepts Designing a Disaster Tolerant Architecture for use with Continentalclusters Preparing the Clusters Building the Continentalclusters Configuration Testing the Continental Cluster Switching to the Recovery Packages in Case of Disaster Restoring Disaster Tolerance Performing a Rehearsal Operation in your Environment Maintaining a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Refer to Appendix D and Appendix E for additional information on the Continentalclusters command set and on configuration file parameters. For details of the cascading failover using HP StorageWorks or EMC Symmetrix disk arrays contact your HP representative. NOTE: This chapter briefly addresses data replication, highly available WANs, and site security and communication. Chapters 3, 4 and 5 give details on physical data replication using the HP StorageWorks Disk Array XP Series with Continuous Access XP, HP StorageWorks Disk Array EVA Series with Continuous Access EVA and the EMC Symmetrix with the SRDF facility. Understanding Continental Cluster Concepts The Continentalclusters product provides the ability to monitor a high availability cluster and fail over mission critical applications to another cluster if the monitored cluster should become unavailable. In the following example, the Los Angeles cluster runs the mission critical application and replicates data to the New York cluster, which has another copy of the mission critical application ready to run in case of failover. In addition, Continentalclusters supports mutual recovery, which allows for different Understanding Continental Cluster Concepts 37 critical applications to be run on each cluster, with each cluster configured to recover the mission critical applications of the other. Because clusters may be separated over wide geographical distances, and because they have independent function, the operation of clusters in a Continentalclusters configuration is somewhat different from that of typical Serviceguard clusters. A typical Continentalclusters recovery pair environment is shown in Figure 2-1. Figure 2-1 Sample Continentalclusters Configuration New York Cluster NYnode1 Highly Available Network Los Angeles Cluster monitor NYnode2 Disk Array WAN Data Replication Links LAnode1 salespkg LAnode2 Disk Array custpkg ESCON/WAN converter Two packages are running on the cluster in Los Angeles, and their data is replicated to the cluster in New York. Physical data replication is carried out using ESCON (Enterprise Storage Connect) links between the disk array hardware in New York and Los Angeles via an ESCON/WAN converter at each end. The New York cluster is running a monitor that checks the status of the Los Angeles cluster. In this example, the Los Angeles cluster runs just like any Serviceguard cluster, with applications configured in packages that may fail from node to node as necessary. The New York cluster is configured with a recovery version of the packages that are running on the Los Angeles cluster. These packages do not run under normal circumstances, but are set to start up when they are needed. In addition, either cluster may run other packages that are not involved in Continentalclusters operation. Mutual Recovery Configuration Bi-directional failover is supported in what is called a “mutual recovery configuration.” This allows recovery groups to be defined for primary packages running in both 38 Designing a Continental Cluster component clusters of a recovery pair in the Continentalclusters configuration. Figure 2-2 shows a mutual recovery configuration. Figure 2-2 Sample Mutual Recovery Configuration New York Cluster NYnode1 Highly Available Network Los Angeles Cluster monitor NYnode2 Disk Array salespkg WAN Data Replication Links LAnode1 monitor LAnode2 Disk Array custpkg ESCON/WAN converter In the above figure, the salespkg is running on the New York cluster and can be recovered by the Los Angeles cluster. Similarly, the custpkg running on the Los Angeles cluster can be recovered by the New York cluster. As stated previously, physical data replication is carried out using ESCON (Enterprise Storage Connect) links between the disk array hardware in New York and Los Angeles via an ESCON/WAN converter at each end. Each cluster is running a monitor that checks the status of the alternate cluster. As depicted in the above example, each cluster runs just like any Serviceguard cluster, with applications configured in packages that may fail from node to node as necessary. Each cluster is configured with a recovery version of the packages that are running on the alternate cluster. These packages do not run under normal circumstances, but are set to start up when they are needed. In addition, either cluster may run other packages that are not involved in Continentalclusters operation. Application Recovery in a Continental Cluster If a given cluster in a recovery pair of a continental cluster should become unavailable, Continentalclusters allows an administrator to issue a single command, cmrecovercl (described later) to transfer mission critical applications from that cluster to another cluster, making sure that the packages do not run on both clusters at the same time. Understanding Continental Cluster Concepts 39 Transfer is not automatic, although it is automated through a recovery command, which a root user must issue. The result after issuing the recovery command is shown in Figure 2-3. Figure 2-3 Continental Cluster After Recovery New York Cluster NYnode2 NYnode1 Highly Available Network Los Angeles Cluster salespkg Disk Array custpkg_bak WAN Data Replication Links LAnode1 LAnode2 Disk Array The movement of an application from one cluster to another cluster does not replace local failover activity; packages are normally configured to fail over from node to node as they would on any high availability cluster. Cluster recovery, failover of packages to a different cluster, occurs only after the following events: • Continentalclusters detects the problem • Continentalclusters sends a notification of the problem • Verify that the monitored cluster has failed • Issue the cluster recovery command Monitoring over a Network A monitor package running on one cluster tracks the health of another cluster in the recovery pair and sends notification to configured destinations if the state of the monitored cluster changes. (If a cluster contains any packages to be recovered it must be monitored.) The monitor software polls the monitored cluster at a specific MONITOR_INTERVAL defined in an ASCII configuration file, which also indicates when and where to send messages if there is a state change. 40 Designing a Continental Cluster The physical separation between clusters will require communication by way of a Local or Wide Area Network (LAN or WAN). Since the polling takes place across the network, interruptions of network service cannot always be differentiated from cluster failure states. This means that if the network is unreliable, the monitoring facility will often detect and report an unreachable state for the monitored cluster that is actually an interruption of the network service. Because the monitoring is indeterminate in some instances, information from independent sources must be gathered to determine the need for proceeding with the recovery process. For these reasons, cluster recovery is not automatic, but must be initiated by a root user. Once initiated, however, the cluster recovery is automated to reduce the chance of human error that might occur if manual steps were needed. In Continentalclusters, a system of cluster events and notifications is provided so that events can be easily tracked, and users will know when to seek additional information before initiating recovery. Cluster Events A cluster event is a change of state in a monitored cluster. The four cluster states reported by the monitor are Unreachable, Down, Up, and Error. Table 2-1 summarizes possible causes for the cluster events with regard to both the monitored cluster and the network. However, in many cases the causes of cluster events are indeterminate without additional information that is not available to the software. Table 2-1 Monitored States and Possible Causes Cluster Event (Old state -> Cluster-related Causes New state) Network-related Causes Up -> Unreachable Cluster went down; no nodes are responding to network inquiries Network failure Down -> Unreachable Cluster was down and nodes are no longer responding Network failure Error -> Unreachable Error resolved but cluster down and nodes not responding Network failure Up -> Down Cluster has been halted, but at least one node is still responding to network inquiries No network problems Error -> Down Error resolved, cluster is down Network problem was fixed, cluster is down Unreachable -> Down Cluster nodes were rebooted but the cluster was not started Network came up but the cluster was not running Up -> Error Serviceguard version or security file mismatch, software error Network is misconfigured, or DNS server crashed or set up incorrectly Understanding Continental Cluster Concepts 41 Table 2-1 Monitored States and Possible Causes (continued) Cluster Event (Old state -> Cluster-related Causes New state) Network-related Causes Down -> Error Serviceguard version or security file mismatch, software error Network is misconfigured, or DNS server crashed or set up incorrectly Unreachable -> Error Serviceguard version or security file mismatch, software error Network problem was fixed, but the error condition still exists Down -> Up Cluster started No network problems Unreachable -> Up Cluster nodes were rebooted and the cluster started Network came up and the cluster was already running Error -> Up Error resolved, cluster is up Network problem was fixed, cluster is up NOTE: There is only one condition under which cmclsentryd will determine that the cluster has Error status: all nodes are unreachable except those which have Serviceguard Error status. (If any nodes are Down or Up, then the cluster status will take one of those values, rather than Error.) Interpreting the Significance of Cluster Events Because some cluster events (for example, Up -> Unreachable) can be caused by changes in either a cluster state or a network state, additional independent information is required to achieve the primary objective of determining whether you need to recover a cluster’s applications. Sources of independent information include: • • • • Contact with the network provider Contact with the administrator of the monitored cluster Contact with local cluster administrator Contact with company executives When problematic cluster events persist, obtain as much information as possible, including authorization to recover, if your business practices require this, and then issue the Continentalclusters recovery command, cmrecovercl. How Notifications Work A central part of the operation of Continentalclusters is the transmission of notifications following the detection of a cluster event. Notifications occur at specifically coded times, and at two different levels: • • 42 Alert — when a cluster event should be considered noteworthy. Alarm — when an event shows evidence of a cluster failure. Designing a Continental Cluster Notifications are typically sent as: • • • • Email messages SNMP traps Text log files OPC messages to OpenView IT/Operations In addition, notifications are sent to the eventlog file located in the /var/opt/ resmon/log/cc directory on the system where monitoring is taking place. NOTE: An email message can be sent to an address supplied by a pager service that will forward the message to a specified pager system. (Contact your pager service provider for more information.) Alerts Alerts are intended as informational. Some typical uses of alerts include: • • • • Notification that a cluster has been halted for a significant amount of time. Notification that a cluster has come up after being down or unreachable. Notification that a cluster came down for any reason. Notification that a cluster has been in an unreachable state for a short period of time. An alert is sent in this case as a warning that an alarm might be issued later if the cluster’s state remains unreachable for a longer time. The expected process in dealing with alerts is to continue watching for additional notifications and to contact individuals at the site of the monitored cluster to see whether problems exist. Alarms Alarms are intended to indicate that a cluster failure might have taken place. The most common example of an alarm is the following: • Notification that a specified cluster has been in an unreachable state for a significant amount of time. The expected process in dealing with cluster events that persist at the alarm level is to obtain as much information as possible, including authorization to recover, if your business practices require this. At which point, issue the Continentalclusters recovery command, cmrecovercl. Understanding Continental Cluster Concepts 43 Creating Notifications for Failure Events For events that indicate potential cluster failure, display the escalation of concern of the cluster health by defining alerts followed by one or more alarms. The following is a typical sequence: • • • cluster alert at 5 minutes cluster alert at 10 minutes cluster alarm at 15 minutes This could be accomplished by entering two CLUSTER_ALERT lines in the configuration file, and one CLUSTER_ALARM line. A detailed example is provided in the comments in the ASCII configuration file template, shown in “Editing Section 3—Monitoring Definitions” (page 79). Creating Notifications for Events that Indicate a Return of Service For those events that indicate that the cluster is back online or that communication with the monitor has been restored, use cluster alerts to show the de-escalation of concern. In this case, use a CLUSTER_ALERT line in the configuration file with a time of zero (0), so that notifications are sent as soon as the return to service is detected. Maintenance Mode for Recovery Groups A recovery group in a maintenance mode allows the recovery group to be exempted from a recovery. This implies that the recovery package cannot be started in a recovery cluster. By default, all recovery groups in the Continentalclusters configuration are not in the maintenance mode. To move a recovery group in continentalclusters into the maintenance mode, you must disable it. To move a recovery group out of the maintenance mode, you must enable it. You can complete rehearsal operations on a recovery group only when the recovery group is in the maintenance mode. For more information on rehearsal operations, see “Performing a Rehearsal Operation in your Environment” (page 106). Use the cmrecovercl -d -g command to move a recovery group into the maintenance mode. To move the recovery group out of the maintenance mode, use the cmrecovercl -e -g command. Maintenance mode for recovery groups is an optional feature. You must explicitly configure Continentalclusters to use this feature. Consider the following guidelines when you move a recovery group into the maintenance mode: • • 44 Configure a shared disk with a file system in all primary clusters and in the recovery cluster. This shared disk is local to the cluster and need not be replicated. Configure the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalcluster configuration file with an absolute path to a file system. Create Designing a Continental Cluster • this path in all nodes and reserve it for Continentalclusters. This file system is used to hold the current maintenance mode setting for recovery groups. Configure the monitor package control script to mount the file system in the shared disk for the path specified with the CONTINENTAL_CLUSTER_STATE_DIR parameter. When a recovery group is in the maintenance mode, start up of a recovery package with cmrecovercl, cmrunpkg or cmmodpkg commands is prevented by Continentalclusters for that recovery group. When a recovery group is in the maintenance mode there is no impact on the availability of the primary packages. The primary package continues to be up and is highly available within the primary cluster (i.e., local failover allowed). Clients can continue to connect to the primary package and access its production data on the primary cluster. There is no dependency on data replication to move a recovery group into maintenance mode. Array based replication can be suspended or can be in progress. Similarly, logical replication can either be suspended (receiver package is down) or can be resumed (receiver package is up). Table 2-2 describes the impact on recovery when a recovery group is in the maintenance mode. Table 2-2 Impact of Maintenance Mode Default Mode Maintenance Mode Recovery package startup using cmrecovercl, Recovery package startup using cmrecovercl, cmrunpkg, cmmodpkg or cmforceconcl cmrunpkg, cmmodpkg or cmforceconcl commands commands is allowed. is not allowed. Cross-checking is done between primary and recovery packages to ensure both packages are not up at the same time. No impact to primary packages. The primary package continues to run irrespective of the mode. The primary package is allowed to start only if the recovery package is down. Similarly, a recovery package is allowed to start only if the primary package is down. Moving a Recovery Group into Maintenance Mode Run the following command to disable a recovery group and move it into the maintenance mode: cmrecovercl -d [-f] -g Where: is the name of the recovery group to be disabled. Understanding Continental Cluster Concepts 45 Run this command only on recovery cluster nodes. This command succeeds only when Continentalclusters is configured for maintenance mode. The command checks for the following conditions to successfully disable a recovery group: • • The recovery package is down and package switching is disabled. The primary cluster and the primary package are up. If the cluster is down or unreachable, use the force -f option to forcefully disable the recovery group. WARNING! When you use the force option, ensure that the primary package and the cluster are not down due to a primary site failure. • The monitor package is up and running in the recovery cluster. Moving a Recovery Group out of the Maintenance Mode Run the following command to enable a recovery group and move it out of the maintenance mode: cmrecovercl -e -g Where: is the name of the recovery group to be enabled and moved out of the maintenance mode. You can run this command only on recovery cluster nodes. The command succeeds only when Continentalclusters is configured for maintenance mode. Following are the conditions that need to be met for the recovery group to be enabled and moved out of the maintenance mode: • • For recovery groups configured with a rehearsal package, the rehearsal package is halted and package switching is disabled. The monitor package is up and running in the recovery cluster. Performing Cluster Recovery When a CLUSTER_ALARM is issued, there may be a need for a cluster recovery using the recovery command, cmrecovercl, which is enabled for use by the root user. Cluster recovery is carried out at the site of the recovery cluster by using thecmrecovercl command. The cmrecovercl command will only recover recovery groups that are enabled for recovery and are not in the maintenance mode. # cmrecovercl Issuing this command will halt any configured data replication activity from the failed cluster to the recovery cluster, and will start all configured recovery packages on the recovery cluster that are pre-configured in recovery groups. A recovery group is the basic unit of recovery used in a continental cluster configuration. This command will fail if a cluster alarm has not been issued. 46 Designing a Continental Cluster If option “-g RecoveryGroup” is specified with the recovery command, then the recovery process of halting data replication activity and starting of the recovery package will only be done for the specified recovery group. After the cmrecovercl command is issued, there is a delay of at least 90 seconds (per recovery group) while the command ensures that the package is not active on another cluster. Cluster recovery is done as a last resort, after all other approaches to restore the unavailable cluster have been exhausted. It is important to remember that cluster recovery sets in motion a process that cannot be easily reversed. Unlike the failover of a package from one node to another, failing a package from one cluster to another normally involves a significant quantity of data that is being accessed from a new set of disks. Returning control to the original cluster will involve resynchronizing this data and resetting the roles of the clusters in a process that is easier for some data replication techniques than others. NOTE: After a recovery, it is not possible to reverse directions and return a package to its original cluster without first reconfiguring the data replication hardware and/or software and synchronizing data. Therefore, be very cautious when deciding to use the cmrecovercl command. It is for this reason, HP recommends that stringent procedures and processes are in place to aid in making the decision to complete a recovery process. Performing Recovery Group Rehearsal in Continentalclusters During a recovery in Continentalclusters, a configuration inconsistency at the recovery cluster can result in an unsuccessful recovery attempt. Rehearsing the recovery procedure provides you a method to proactively identify and fix these configuration inconsistencies so that there are no issues during an actual recovery. Continentalclusters provides the environment and a set of required tools to complete a Disaster Recovery (DR) Rehearsal. Continentalclusters allows recovery groups to be configured with a special rehearsal package for DR rehearsal. You must configure the rehearsal package in the recovery cluster and specify it as part of the recovery group definition. The rehearsal package is identical to the recovery package and can be effectively used in place of the recovery package to verify the environment. This rehearsal package bundles the same application and uses the same storage devices as the recovery package. During a DR rehearsal process, Continentalclusters will start the rehearsal package and validate the recovery cluster environment. However, to stop clients from using the recovery package application instance, it is necessary that the client access network IP address be different for the rehearsal package. For more information on configuring a rehearsal package, see “Configuring Recovery Groups with Rehearsal Packages” (page 64). Understanding Continental Cluster Concepts 47 Also, you must configure Continentalclusters to enable the maintenance mode feature for recovery groups. For more information on the maintenance mode, see “Maintenance Mode for Recovery Groups” (page 44). Disaster Recovery Rehearsal for a recovery group is done in the following phases: • Rehearsal Preparation Phase In this phase, you must prepare your environment for rehearsal. To prepare a recovery group for rehearsal, complete the following steps: 1. 2. 3. 4. 5. • Enter the cmrecovercl -d command to move the recovery group into the maintenance mode by disabling it. Suspend replication from the primary cluster. Prepare a business copy (BC/BCV) of the storage on the recovery cluster. Make the storage system read-write on the recovery cluster. Import the volume manager entities such as LVM or SLVM volume groups or the VxVM or CVM disk groups in the recovery cluster. Rehearsal Run Phase Start the DR rehearsal for the recovery group using the Continentalclusters command cmrecovercl -r. This command starts the rehearsal package for the recovery group and validates the Continentalclusters configuration and environment for the recovery group. Once rehearsal is started, use the regular Serviceguard commands to manage the rehearsal package. You can run any test load on the rehearsal package to validate the recovery of the application. • Rehearsal Restoration Phase Once rehearsal process is complete, you must restore recovery for the recovery group. You must halt and disable the rehearsal package in the cluster and synchronize the recovery cluster storage with the primary site storage with the latest application data. Also, you must resume replication from the primary cluster. Finally, you need to move the recovery group out of the maintenance mode by enabling it using the cmrecovercl -e command. WARNING! Ensure that the storage system of the recovery group is synchronized with the latest data and the replication environment is restored before the recovery group is moved out of the maintenance mode. Failure to do so can result in the recovery package using production data that was invalidated by the rehearsal run during a subsequent recovery. For information on running a rehearsal process in your environment, see Appendix G. 48 Designing a Continental Cluster Notes on Packages in a Continental Cluster Packages have somewhat different behavior in a continental cluster than in a normal Serviceguard environment. There are specific differences in • • Startup and Switching Characteristics Network Attributes From Serviceguard A.11.17 and above, you can configure the following package types in a recovery group: • • Failover Oracle RAC Multi-node packages In the case of a multi-node package, a recovery process recovers all instances of the package in a recovery cluster. NOTE: System multi-node packages cannot be configured in Continentalclusters recovery groups. Multi-node packages are supported only for Oracle with CFS or CVM environments. Startup and Switching Characteristics Normally, an application (package) can run on only one node at a time in a cluster. However, in a continental cluster, there are two clusters in which an application—the primary package or the recovery package—could operate on the same data. Both the primary and the recovery package must not be allowed to run at the same time. To prevent this, it is important to ensure that packages are not allowed to start automatically and are not started at inappropriate times. To keep packages from starting up automatically, when a cluster starts, set the AUTO_RUN (PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) parameter for all primary and recovery packages to NO. Then use the cmmodpkg command with the -e option to start up only the primary packages and enable Understanding Continental Cluster Concepts 49 switching. The cmrecovercl command, when run, will start up the recovery packages and enable switching during the cluster recovery operation. CAUTION: After initial testing is complete, the cmrunpkg and cmmodpkg commands or the equivalent options in SAM should never be used to start a recovery package unless cluster recovery has already taken place. To prevent packages from being started at the wrong time and in the wrong place, use the following strategies: • • • Set the AUTO_RUN (PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) parameter for all primary and recovery packages to NO. Ensure that recovery package names are well known, and that personnel understand they should never be started with a cmrunpkg or cmmodpkg command unless the cmrecovercl command has been invoked first. If a cluster has no packages to run before recovery, then do not allow packages to be run on that cluster with Serviceguard Manager. Network Attributes Another important difference between the packages configured in a continental cluster and the packages configured in a standard Serviceguard cluster is that the same or different subnets can be used for primary cluster and recovery cluster configurations. In addition, the same or different relocatable IP addresses can be used for the primary package and its corresponding recovery package. The client application must be designed properly to connect to the appropriate IP address following a recovery operation. For recovery groups with a rehearsal package configured, ensure that the rehearsal package IP address is different from the recovery package IP address. How Serviceguard commands work in a Continentalclusters Continentalclusters packages are manipulated manually by the user via Serviceguard commands and by cmcld automatically in the same way as any other packages. In a continental cluster the recovery package are not allowed to run at the same time as the primary, data sender, or data receiver packages. To enforce this, several Serviceguard commands behave in a slightly different manner when used in a continental cluster. Table 2-3 describes the Serviceguard commands whose behavior is different in a continental cluster environment. Specifically, when one of the commands listed in Table 2-3 attempts to start or enable switching of a package, it first checks the status of the other packages in the recovery group. Based on the status, the operation is either allowed or disallowed. The checking is done based on the stable clusters' environment and the proper functioning of the network communication. In the case when the network 50 Designing a Continental Cluster communication between clusters can not be established or the cluster or package status can not be determined, it is must be checked manually to ensure that the operation to be performed on the target package will not have a conflict with other packages configured in the same recovery group. Table 2-3 Serviceguard and Continentalclusters Commands Commands How the commands How the commands work in Continentalclusters work in Serviceguard cmrunpkg runs a package Will not start a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not start recovery package if the recovery group is in maintenance mode. Will not start a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled. Will not start a rehearsal package when the recovery group is not in maintenance mode. cmmodpkg -e enable switching attribute for a highly available package Will not enable switching on a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not enable switching for a recovery package if the recovery group is in maintenance mode. Will not enable a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled. Will not enable switching for a rehearsal package when the recovery group is not in maintenance mode. cmhaltnode -f halts a node in a highly available cluster Will not re-enable switching on a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not re-enable a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled. cmhaltcl -f This command will halt daemons on all currently running systems Will not re-enable switching on a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled. Will not re-enable a primary, data receiver, or data sender package if the recovery package in the same recovery group is running or enabled. Designing a Disaster Tolerant Architecture for use with Continentalclusters A recovery pair in a continental cluster consists of two Serviceguard clusters. One functions as a primary cluster and the other functions as recovery cluster for a specific application. Prior to Continentalclusters version A.05.00, one recovery pair can be configured in a continental cluster. Starting with Continentalclusters version A.05.00, a configuration of multiple recovery pairs is allowed. In the multiple recovery pair configuration, more than one primary cluster (where the primary packages are running) can be configured to share the same recovery cluster (where the recovery package is running). Designing a Disaster Tolerant Architecture for use with Continentalclusters 51 The key elements providing disaster tolerance in a continental cluster recovery pair are: • • • • • Mutual Recovery Serviceguard clusters Data replication Highly available WAN networking Data center processes and procedures coordinated between the two cluster sites There is significant amount of latitude in selecting these elements for a configuration. It is recommended the choices are recorded on worksheets which can be reviewed and updated periodically. Mutual Recovery For mutual recovery, any cluster in a continental cluster recovery pair may contain both primary and recovery packages for any recovery group. Recovery groups may be defined, for example, such that cluster A and cluster B contain recovery packages. In this case, cmrecovercl could be run on cluster B to recover packages from cluster A, or on cluster A to recover packages from cluster B. Serviceguard Clusters Each Serviceguard cluster in a continental cluster provides high availability for an application at the local level at that particular site. For optimal performance and to assure adequate capacity on the recovery cluster, it is best to have similar hardware on both clusters. For example, if one cluster contains two systems with 1Gb of memory each, it is not a good idea to have a low-end system with 128 Mb of memory in the other cluster. Each cluster may have as many nodes as are permitted in an ordinary Serviceguard cluster, and each may be running packages that are not configured to fail over between clusters. NOTE: Take note when cluster A takes over for cluster B, it must run cluster B’s packages as well as any packages that it was already running on its own, unless those packages are stopped intentionally. Data Replication Data replication between the Serviceguard clusters in a Continentalclusters recovery pair extends the scope of high availability to the level of the continental cluster. Select a technology for data replication between the two clusters. There are many possible choices, including: • • 52 Logical replication of databases Logical replication of file systems Designing a Continental Cluster • • Physical replication of data volumes via software Physical replication of disk units via hardware Table 2-4 is a brief discussion of how a data replication method affects a continental cluster environment. A detailed description of data replication can be found in Chapter 1, in the section titled “Disaster Tolerance and Recovery in a Serviceguard Cluster.” Specific guidelines for configuring the HP StorageWorks Disk Array XP Series, HP StorageWorks Disk Array EVA Series and the EMC Symmetrix Disk Array for physical data replication in a continental cluster are provided in Chapters 3, 4 and 5. In order to use these data replication solutions in a Continentalclusters environment it is necessary to purchase either the Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products separately. White papers describing specific implementations are also available at www.docs.hp.com -> High Availability If a data replication technology is chosen that is not mentioned above, and if the integration is performed independently, then it is necessary to use the guidelines described in section, “Using the Recovery Command to Switch All Packages” (page 95). In that case, note the following: • • Continentalclusters product is only responsible for the following: Continentalclusters configuration and management commands, the monitoring of remote cluster status, and the notification of remote cluster events. Continentalclusters product provides a single recovery command to start all recovery packages that are configured in the Continentalclusters configuration file. These recovery packages are typical Serviceguard's packages. Continentalclusters recovery command does not do any checking on the status of the devices and data that are used by the application prior to starting the recovery package. The user is responsible for checking the state of the devices and the data before executing Continentalclusters recovery command. Table 2-4 Data Replication and Continentalclusters Replication Type How it Works Continentalclusters Implication Logical Database Replication Transactions from the primary Requirements on CPU and I/O may limit application are applied from logs or prevent the Recovery Cluster from to a copy of the application running running additional applications. on the recovery site. (This is an example only; there are other methods.) Logical Filesystem Replication Writes to the filesystem on the CPU issues are the same as for Logical primary cluster and are duplicated Database Replication. The software may periodically on the recovery cluster. have to be managed as a separate Serviceguard package. Designing a Disaster Tolerant Architecture for use with Continentalclusters 53 Table 2-4 Data Replication and Continentalclusters (continued) Replication Type How it Works Continentalclusters Implication Physical Replication of Data Volumes via Software Disk mirroring via LVM software. Mirroring is done on disk links (SCSI or FibreChannel). Requirements on CPU are less than for logical replication, but there is still some CPU use. Distance limits may make this type of replication inappropriate for Continentalclusters. Physical Replication of Replication of the LUNs across disk Disk Units via Hardware arrays through dedicated hardware links such as EMC SRDF or Continuous Access XP or Continuous Access EVA. Limited CPU requirements, but the requirement of synchronous data replication slows replication, and may impair application performance. Increased network speed and bandwidth can remedy this. Logical data replication may require the use of packages to handle software processes that copy data from one cluster to another or that apply transactions from logs that are copied from one cluster to another. Some methods of logical data replication may use a logical replication data sender package, and others may use a logical replication data receiver package while some may use both. Logical replication data sender and receiver packages are configured as part of the data recovery group, as shown in section, “Preparing the Clusters” (page 59). Physical Data Replication using Special Environment files For physical data replication Continentalclusters uses pre-integrated solutions, which uses Continuous Access XP, Continuous Access EVA and EMC SRDF. In order to use these data replication solutions in a Continentalclusters environment it is necessary to purchase either the Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products separately. Physical data replication generally does not require the use of separate sender or receiver packages, but it does require specialized logic in the package control scripts to handle the transfer of control from the storage units of one cluster to the storage units at the other cluster. The packages that use physical data replication with the HP StorageWorks Disk Array XP Series with Continuous Access XP should have created a specific environment file using template /opt/cmcluster/toolkit/SGCA/xpca.env For packages that are using physical data replication with HP StorageWorks Disk Array EVA with Continuous Access EVA should be created using /opt/cmcluster/ toolkit/SGCA/caeva.env, and for packages that are using physical data replication with EMC Symmetrix and the SRDF facility should be created using /opt/cmcluster/ toolkit/SGSRDF/srdf.env. These templates can be purchased separately with the products Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF. 54 Designing a Continental Cluster Details on configuring the special Continentalclusters control scripts are in Chapters 3, 4 and 5. Some additional notes are provided below. Multiple Recovery Pairs in a Continental Cluster One or more than one recovery pair can be configured in a continental cluster. In the Continentalclusters configuration that contains more than one recovery pair, more than one primary cluster is configured to share a common recovery cluster. Similar to the one recovery pair per continental cluster configuration, mutual recovery can also be configured in a multiple recovery pair scenario, as shown in Figure 2-4. The common recovery cluster can choose any one of the primary clusters as its recovery cluster. Data replication needs to be setup to allow for copying data from each primary cluster to the common recovery cluster. Each recovery pair should have its own data replication link. Different storage areas need to be configured with the common recovery cluster to receive data replicated from each primary clusters. The common recovery cluster should have enough capacity to serve the recovery purpose for all of the primary clusters configured to partner with it in a recovery pair. Designing a Disaster Tolerant Architecture for use with Continentalclusters 55 Figure 2-4 Multiple Recovery Pair Configuration in a Continental Cluster New York Cluster NYnode1 Highly Available Highly Available Network Los Angeles Cluster Disk Array WAN LAnode1 sales_bak monitor NYnode2 Disk Array LAnode2 HRpkg cust_bak Atlanta monitor Atlanta node2 Atlanta node1 Los Angeles Cluster Disk Array salespkg custpkg Data Replication Links between LA & Atlanta Data Replication Links between LA & NY Highly Available Wide Area Networking Disaster tolerant networking for Continentalclusters is directly tied to the data replication method. In addition to the reliability of the redundant lines connecting the remote nodes, it is important to consider what bandwidth is needed to support the data replication method that has been chosen. A continental cluster that handles a high number of write transactions per minute will not only require a highly available network, but also one with a large amount of bandwidth. Details on highly available networking can be found in Chapter 1, in the section titled “Disaster Tolerant Architecture Guidelines.” White papers describing specific implementations are also available at: www.docs.hp.com -> High Availability -> Continentalcluster or Metrocluster -> White Papers 56 Designing a Continental Cluster Data Center Processes Continentalclusters provides the cmrecovercl command that fails over all applications on the primary cluster in a recovery pair that are protected by Continentalclusters. However, application failover also requires well-defined processes for the two sites of a recovery pair. These processes and procedures should be written down and made available at both sites. Some considerations for site management are as follows: • • • • • • Who notifies whom for the various events: configuration changes, alerts, alarms? What communication methods should be used? Email? Phone? Beeper? Multiple methods? Who has the authority to perform what sort of configuration modifications? Can the administrator at one site log in to the nodes on the remote site? If so, what permissions would be set? How often is a practice failover done? Is there a documented test plan? What is the process for tracking changes made to the primary cluster? Continentalclusters Worksheets Planning is an essential effort in creating a robust continental cluster environment. It is recommended to record the details of your configuration on planning worksheets. These worksheets can be filled in partially before configuration begins, and then completed as you build the continental cluster. All the participating Serviceguard clusters in one continental cluster should have a copy of these worksheets to help coordinate initial configuration and subsequent changes. Complete the worksheets in the following sections for each recovery pair of clusters that will be monitored by the Continentalclusters monitor. Data Center Worksheet The following worksheet will help you describe your specific data center configuration. Fill out the worksheet and keep it for future reference. ============================================================== Continental Cluster Name: _____________________________________ Continental Cluster State Dir: ________________________________ ============================================================== Primary Data Center Information:_________________________________ Primary Cluster Name: ________________________________________ Data Center Name and Location: _______________________________ Main Contact: _______________________________________________ Phone Number: ________________________________________________ Beeper: ______________________________________________________ Email Address: _______________________________________________ Node Names: __________________________________________________ Monitor Package Name: __ccmonpkg______________________________ Designing a Disaster Tolerant Architecture for use with Continentalclusters 57 Monitor Interval: _____________________________________________ Continental Cluster State Shared Disk: ________________________ ============================================================== Recovery Data Center Information: Recovery Cluster Name: ______________________________________ Data Center Name and Location: ______________________________ Main Contact: _______________________________________________ Phone Number: _______________________________________________ Beeper: _____________________________________________________ Email Address: ______________________________________________ Node Names: _________________________________________________ Monitor Package Name: __ccmonpkg_____________________________ Monitor Interval: ___________________________________________ Continental Cluster State Shared Disk: ________________________ Recovery Group Worksheet The following worksheet will help you organize and record your specific recovery groups. Fill out the worksheet and keep it for future reference. =============================================================== Continental Cluster Name: _____________________________________ ============================================================== Recovery Group Data: _________________________________________ Recovery Group Name: _________________________________________ Primary Cluster/Package Name:_________________________________ Data Sender Cluster/Package Name:_____________________________ Recovery Cluster/Package Name:________________________________ Rehearsal Cluster/Package Name: ______________________________ Data Receiver Cluster/Package Name:___________________________ Recovery Group Data:_________________________________________ Recovery Group Name: ________________________________________ Primary Cluster/Package Name:________________________________ Data Sender Cluster/Package Name:___________________________ Recovery Cluster/Package Name:_______________________________ Rehearsal Cluster/Package Name:______________________________ Data Receiver Cluster/Package Name:__________________________ Recovery Group Data: Recovery Group Name: ________________________________________ Primary Cluster/Package Name:________________________________ Data Sender Cluster/Package Name:____________________________ Recovery Cluster/Package Name:_______________________________ Rehearsal Cluster/Package Name:______________________________ Data Receiver Cluster/Package Name:____________________________ Cluster Event Worksheet The following worksheet will help you organize and record the cluster events you wish to track. Fill out a worksheet for each primary or recovery cluster that you wish to monitor. You must monitor each cluster containing a primary package which needs to be recovered. 58 Designing a Continental Cluster Continental Cluster Name: _____________________________________ =============================================================== Cluster Event Information: Cluster Name ________________________________________________ Monitoring Cluster: __________________________________________ UNREACHABLE: Alert Interval:______________________________________________ Alarm Interval:______________________________________________ Notification:_________________________________________________ Notification:_________________________________________________ Notification:_________________________________________________ DOWN: Alert Interval:______________________________________________ Notification:________________________________________________ Notification:_______________________________________________ UP: Alert Interval:_____________________________________________ Notification:_______________________________________________ Notification:_______________________________________________ ERROR: Alert Interval:_____________________________________________ Notification:_______________________________________________ Notification:_______________________________________________ Preparing the Clusters The steps for configuring the clusters, needed by Continentalclusters, are as follows: • • Set up and test data replication between the sites. Configure each cluster for Serviceguard operation. Setting up and Testing Data Replication Depending on which data replication method you choose, it can take a week or more to set up and test a data replication method. If there is more than one recovery pair configured in a continental cluster, a separate data replication link is required to be set up for a different recovery pair (one for each pair). In the sample configuration, physical data replication is done through a hardware link between disk arrays. Because this method is hardware based, there is hardware set up and configuration that can take several days. Some logical replication methods, such as transaction processing monitors (TPMs), need application changes that are more easily done during the original application development. Make sure that the data replication to the recovery site is functional. This would include setting up the physical data replication links across the WAN and making sure that the data is replicated between the shared disk arrays. Preparing the Clusters 59 NOTE: If using physical data replication on the HP StorageWorks Disk Array XP Series with Continuous Access XP or HP StorageWorks Disk Array EVA Series with Continuous Access EVA or on the EMC Symmetrix using EMC SRDF, use the special environment file templates that are installed along with either Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF software. Refer to Chapters 3, 4 and 5 for detailed instructions on configuring these special environment files. If the data replication software is separate from the application itself, then a separate Serviceguard package should be created for it. Some kinds of logical data replication require that a data receiver package be running on the recovery cluster at all times. If data sender and data receiver packages are used as your choice of data replication method, configure and apply them as described in the next sections before applying the continental cluster configuration. Table 2-5 shows the types of packages that are needed for each type of data replication. Table 2-5 Continentalclusters Data Replication Package Structure Primary Cluster Recovery Cluster Replication Type Primary Application Package Data Replication Sender Package Recovery Application Package Data Replication Receiver Package XP Series Continuous Access Yes No Yes No EVA Series Continuous Access Yes No Yes No Symmetrix/EMC SRDF Yes No Yes No Oracle Standby Database No Yes Yes Yes Configuring a Cluster without Recovery Packages Use the following steps and the instructions described in chapters 4 through 7 of Managing Serviceguard user’s guide as guidelines for creating a new cluster or preparing an existing cluster to run in a Continentalclusters environment: 1. 2. 60 If creating a new cluster, install required versions of HP-UX and Serviceguard. Also, if using an existing cluster, upgrade to the versions of HP-UX and Serviceguard that are required for Continentalclusters. See the Continentalclusters Release Notes for specifics on your versions requirements. Coordinate with the recovery site to make sure the same versions and patches are installed at both sites. Set up all cabling, being sure to provide redundant disk storage links and network connections. Designing a Continental Cluster 3. 4. 5. 6. Configure the disks and filesystems. Set up data replication (logical or physical). Configure the cluster according to the instructions in chapter 5 of the Managing Serviceguard user’s guide. Use the cmapplyconf command to apply the cluster configuration. Then test the cluster. Configure and test each primary package according to the instructions in chapters 6 and 7 of the Managing Serviceguard user’s guide. Use the cmapplyconf command to apply the package configuration. Be sure that AUTO_RUN is set to NO in the package ASCII configuration file for any package that is in a recovery group, and therefore might at some time be a candidate for recovery. This is to ensure that the package will not be automatically started if the primary site tries to come up again following a primary site disaster. If changing the setting of the AUTO_RUN parameter to NO in the ASCII configuration file for an existing package, then re-apply the configuration using the cmapplyconf command. NOTE: When package switching is disabled, a package does not automatically start at cluster startup time. Therefore, setting AUTO_RUN(PKG_SWITCHING_ENABLED) to NO means that primary packages in recovery groups must be started manually on the primary cluster. They also must be manually enabled for local switching, using the cmmodpkg -e command. 7. 8. Test local failover of the packages. In our sample case, this would mean enabling package switching for salespkg (cmmodpkg -e salespkg) and then testing that salespkg fails over from LAnode1 to LAnode2. If using logical data replication, configure and test the data sender package if one is needed. NOTE: If you are configuring Oracle RAC instances in Serviceguard packages in a CFS or CVM environment, do not specify the CVM_DISK_GROUPS, and CVM_ACTIVATION_CMD fields in the package control scripts as CVM disk group manipulation is addressed by the disk group multi-node package. The primary cluster is shown in Figure 2-5. Preparing the Clusters 61 Figure 2-5 Sample Local Cluster Configuration Los Angeles Cluster LAnode1 salespkg WAN LAnode2 custpkg Highly Available Network Configuring a Cluster with Recovery Packages Use the following steps and the instructions in chapters 4 through 7 of the Managing Serviceguard user’s guide as guidelines for creating a new Recovery Cluster or preparing an existing cluster to run in a Continentalclusters environment: 1. Configure all hardware. Make sure the cluster hardware is able to handle the task of running any or all packages it supports in the Continentalclusters configuration: a. If this is a new cluster, make sure the hardware is similar to that of the other cluster. The recovery cluster must be built using servers of sufficient size and resources so that they can take over packages on recovery and also be able to run their own packages, if required. b. If this is an existing cluster, determine whether it is necessary to add disks for data replication. This is needed to ensure that there is enough capacity from system resources to run all packages if applications fail over to the other cluster. If not, either add nodes to the existing cluster, or move less critical packages to another cluster. 2. For new clusters, install minimum required versions of HP-UX and Serviceguard. For existing clusters, perform a rolling upgrade to the minimum required versions of HP-UX and Serviceguard if necessary. Coordinate with the other site to make sure the same versions and patches are installed at both sites. This may include coordinating between HP support personnel if the sites have separate support contracts. Configure logical volumes, using the same names on both the clusters. If your cluster uses a physical data replication method and if data replication between the disk arrays at the different data centers has already taken place, vgimport and vgchange can be used to help configure the logical volumes on the Recovery Cluster. 3. 62 Designing a Continental Cluster 4. 5. Use cmgetconf to capture the other cluster’s configuration. Then use cmquerycl on this cluster to generate a new ASCII file for the recovery configuration. Modify the node names, volume group names, resource names, and subnets as appropriate so that the two clusters will be consistent. See chapter 5 in the Managing Serviceguard user’s guide for details on cluster configuration. Set up the recovery package(s): a. Copy the package files from the other cluster in the recovery pair for all mission critical applications to be monitored by Continentalclusters. In the sample configuration this means copying the ASCII files salespkg.configand custpkg.config, and the control scripts salespkg.cntl and custpkg.cntl. (If preferred rename the package configuration files using a naming convention that identifies a package is a Continentalclusters monitored package. For example, if preferred, name the sample package salespkg_bak.config to indicate that it is the backup or recovery package.) b. Edit the package configuration files, replacing node names, subnets, and other elements as needed. For all recovery packages, be sure that AUTO_RUN (PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) is set to NO in the configuration file. This will ensure that the recovery packages will not start automatically when the recovery cluster forms, but only when the cmrecovercl command is issued. The following elements should be the same in the package configuration for both the primary and recovery packages: • Package services • Failfast settings c. Modify the package control script (salespkg_bak.cntl), checking for anything that may be different between clusters: • Volume groups (VGs) may be different. • IP addresses may be different. • Site-specific customer-defined routines (for example routines that send messages to a local administrator) may be different. • Control script files must be executable. NOTE: If you are using physical data replication on the HP StorageWorks Disk Array XP Series with Continuous Access XP or HP StorageWorks Disk Array EVA Series with Continuous Access EVA or on the EMC Symmetrix using EMC SRDF, use the special environment file templates that are provided by the separately purchased Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products. 6. Apply the configuration using cmapplyconf and test the cluster. Preparing the Clusters 63 IMPORTANT: You must halt the primary package and the data sender packages before you attempt to run or test any recovery packages. 7. 8. 9. Test local failover of the packages. In our sample case, this would mean enabling package switching for salespkg_bak (cmmodpkg -e salespkg_bak) and then testing that salespkg_bak fails over from NYnode1 to NYnode2. If you are using logical data replication, configure, apply, and test the data receiver package if one is needed. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. The New York cluster is shown in Figure 2-6. Figure 2-6 Sample Cluster Configuration with Recovery Packages New York Cluster NYnode1 Highly Available Network salespkg_bak Disk Array NYnode2 custpkg_bak WAN Configuring Recovery Groups with Rehearsal Packages The rehearsal package is a regular Serviceguard package configured on the recovery cluster of the recovery group. You must configure the rehearsal package with the same volume group and file system mount points as that of the recovery package. The application setup and configuration used for the recovery package are also used for the rehearsal package. Similar to all other Continentalclusters packages, the AUTO_RUN parameter for the rehearsal package must be set to NO. 64 Designing a Continental Cluster NOTE: When using physical replication, do not configure the Metrocluster environment files for the rehearsal package. The rehearsal package must have an IP address that is different from that of the recovery package. If the rehearsal package has the same IP address as the recovery package, clients may connect to the rehearsal instance mistaking it for the production instance. Building the Continentalclusters Configuration If necessary, use the swinstall command to install the Continentalclusters product on all nodes in both clusters. Then create the Continentalclusters configuration using the following steps: • • • • • • • • Prepare the security files. Create the monitor package on each cluster containing a recovery package. Clusters not containing a recovery package may also monitor the other cluster in the recovery pair by creating a monitor package on that cluster. Edit the Continentalclusters configuration file on a node of your choice in any cluster. Check and apply the Continentalclusters configuration. Start each Continentalclusters monitor package on it’s cluster. Validate the configuration. Document the recovery procedure and distribute the documentation to both sites. Make sure all personnel are familiar with these procedures. Test recovery procedures. Preparing Security Files Running a Continentalclusters command requires root access to cluster information on all the nodes of the participating Serviceguard clusters in the configuration. Before doing the Continentalclusters configuration, edit the /etc/cmcluster/ cmclnodelist file on each node of all the participating clusters to include entries that will allow access by all nodes in the Continentalclusters. Here is a sample entry in the /etc/cmcluster/cmclnodelist file for a continental cluster configured with two, two-node Serviceguard clusters: lanode1.myco.com lanode2.myco.com nynode1.myco.com nynode2.myco.com root root root root Also, be sure to create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running monitor packages and Continentalclusters commands to obtain information from other nodes about the health of each cluster. The file must Building the Continentalclusters Configuration 65 contain entries that allow access to all nodes in the continental cluster by the nodes where monitors and Continentalclusters commands are running. Define the order of security checking by creating entries of the following types: order deny,allow If deny is first, the deny list is checked first to see if the node is there, then the allow list is checked. deny from lists all the nodes that are denied access. Permissible entries are: All hosts are denied access. all allow from 66 Designing a Continental Cluster domain Hosts whose names match, or end in, this string are denied access, for example, hp.com. hostname The named host (for example, kitcat.myco.com) is denied access. IP address Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet restriction is denied. network/netmask This pair of addresses allows more precise restriction of hosts, (for example, 10.163.121.23/225.225.0.0). network/nnnCIDR This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for Classless Interdomain Routing, a type of routing supported by the Border Gateway Protocol (BGP). This lists all the nodes that are allowed access. Permissible entries are: All hosts are allowed access. all domain Hosts whose names match, or end in, this string are allowed access, for example, hp.com. hostname The named host (for example, kitcat.myco.com) is allowed access. IP address Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet inclusion is allowed. network/netmask This pair of addresses allows more precise inclusion of hosts, (for example, 10.163.121.23/225.225.0.0). network/nnnCIDR This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for Classless Interdomain Routing, a type of routing supported by the Border Gateway Protocol (BGP). The most typical entry is hostname. The following entries are from a typical /etc/ opt/cmom/cmomhosts file: order allow allow allow allow allow allow,deny from lanode1.myco.com from lanode2.myco.com from nynode1.myco.com from nynode2.myco.com from 10.177.242.12 If the file is installed on all nodes in the continental cluster, these entries will allow Continentalclusters commands and monitors running on lanode1, lanode2, nynode1, nynode2 to obtain information about the clusters in the configuration. Network Security Configuration Requirements In a Continentalclusters configuration, if the clusters are behind firewalls in their respective sites, you must set appropriate firewall rules to enable inter-cluster communication. The monitoring daemon of Continentalclusters communicates with Serviceguard Cluster Object Manager on remote clusters. You can determine the ports used by Cluster Object Manager from the hacl-probe entry in the /etc/services file. In the firewall of all participating clusters, you must set the rule such that TCP and UDP protocol traffic on the hacl-probe ports are allowed from and to the IP addresses of all nodes in the Continentalclusters configuration. For more information on firewall and ports, see HP Serviceguard A.11.18 Release Notes available at http://www.docs.hp.com -> High Availability. Creating the Monitor Package The Continentalclusters monitoring software is configured as a Serviceguard package so that it remains highly available. If more than one primary cluster is configured to share the same common recovery cluster, such as a multiple recovery pair scenario, Building the Continentalclusters Configuration 67 the monitor package running on the common recovery cluster performs the following: • • monitors all of the primary clusters sends notifications for all of the monitored clusters events The following steps should be carried out on the recovery cluster and can be repeated on the primary cluster if you want the primary cluster to monitor the recovery cluster: 1. On the node where the configuration is located, create a directory for the monitor package. # mkdir /etc/cmcluster/ccmonpkg 2. Copy the template files from the /opt/cmconcl/scripts directory to the /etc/ cmcluster/ccmonpkg directory. # cp /opt/cmconcl/scripts/ccmonpkg.* \ /etc/cmcluster/ccmonpkg • • ccmonpkg.config is the ASCII package configuration file template for the Continentalclusters monitoring application. ccmonpkg.cntl is the control script file for the Continentalclusters monitoring application. NOTE: It is not recommended editing the ccmonpkg.cntlfile. However, if preferred, change the default SERVICE_RESTART value “-r 3” to a value that fits your environment. 3. Edit the package configuration file (suggested name of /etc/cmcluster/ ccmonpkg/ccmonpkg.config) to match the cluster configuration: a. Add the names of all nodes in the cluster on which the monitor may run. b. AUTO_RUN(PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) should be set to YES so that the monitor package will fail over between local nodes. (Note, for all primary and recovery packages, AUTO_RUN is always set to NO.) 4. Continentalclusters provides an optional feature for recovery groups to be in the maintenance mode. To enable this feature, configure the monitor package with a file system in a shared disk. For more information configuring this maintenance mode feature, see “Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters” (page 69). Use the cmcheckconf command to validate the package. 5. # cmcheckconf -P ccmonpkg.config 6. 68 Copy the package configuration file ccmonpkg.config and control script ccmonpkg.cntl to the monitor package directory (default name /etc/ Designing a Continental Cluster 7. cmcluster/ccmonpkg) on all the other nodes in the cluster. Make sure this file is executable. Use the cmapplyconf command to add the package to the Serviceguard configuration. # cmapplyconf -P ccmonpkg.config The following sample package configuration file (comments have been left out) shows a typical package configuration for a Continentalclusters monitor package: PACKAGE_NAME ccmonpkgPACKAGE_TYPE FAILOVERFAILOVER_POLICY CONFIGURED_NODEFAILBACK_POLICY MANUALNODE_NAME LAnode1 NODE_NAME LAnode2AUTO_RUN YESLOCAL_LAN_FAILOVER_ALLOWED YESNODE_FAIL_FAST_ENABLED NORUN_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntl RUN_SCRIPT_TIMEOUT NO_TIMEOUTHALT_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntlHALT_SCRIPT_TIMEOUT NO_TIMEOUTSERVICE_NAME ccmonpkg.srvSERVICE_FAIL_FAST_ENABLED NO SERVICE_HALT_TIMEOUT 300 CAUTION: Do not run a monitor package until the steps for “Checking and Applying the Continentalclusters Configuration” (page 87) are completed. Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters To configure the recovery group maintenance feature, you need to configure a file system on a shared disk in all the clusters configured in the Continentalclusters. The shared disk must have a minimum of 250MB disk space. Specify the file system path using the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters configuration file. Create this directory and reserve it for Continentalclusters on all nodes in the Continentalclusters. Configure the monitor package in the recovery clusters to mount the file system from the shared disk. Configuring Shared Disk for the Maintenance Feature Identify a shared disk connected to all nodes at the recovery cluster where the monitor package (ccmonpkg) will run. Create a volume group with one volume on the shared disk and complete the following procedure: 1. Create the physical volume: pvcreate -f /dev/c0t10d0 2. Create volume group directory under the device special file namespace: mkdir /dev/ccvg Building the Continentalclusters Configuration 69 3. Create the group special file using the available major number: mknod /dev/ccvg/group c 64 0x060000 4. Create the volume group: vgcreate /dev/ccvg /dev/c0t10d0 5. Activate the volume group: vgchange -a y ccvg 6. Create the logical volume: lvcreate -L 250M ccvg Run the following command to create a file system on the volume: mkfs vxfs /dev/ccvg/lvol1 Complete the following procedure to export the volume group configuration and import the volume group on all the nodes at the recovery cluster: 1. On the node where you created the volume, deactivate the volume group and export the VG configuration in preview mode to a file: vgchange -a n ccvg vgexport -m /tmp/ccvg.map -p ccvg 2. Copy the file to all the nodes: rcp /tmp/ccvg.map node1:/tmp 3. On each node, create the volume group directory and the group special file: mkdir /dev/ccvg mknod /dev/ccvg/group c 64 0x060000 4. Import the volume group from the map file: vgimport -m /tmp/ccvg.map -v Configuring a Monitor Package for the Maintenance Feature Configure the Continentalclusters monitor package using the template scripts available in the /opt/cmconcl/scripts/ directory: 1. 2. Create the /etc/cmcluster/ccmonpkg directory on all nodes in the recovery cluster. On any node in the recovery cluster, copy the package configuration and control file template from the /opt/cmconcl/scripts directory to the /etc/ cmcluster directory: cp /opt/cmconcl/scripts/ccmonpkg.* 70 Designing a Continental Cluster 3. In the ccmonpkg.cntl monitor package control file, specify the volume group for the VG parameter in the VOLUME GROUPS section: VG[0]="ccvg" 4. In the ccmonpkg.cntl monitor package control file, specify a file system path and the logical volume name under the FILE SYSTEM section. The file system path should be the value configured for the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters configuration file. This path should be created and reserved on all nodes in the Continentalcluster. LV[0]=/dev/ccvg/lvol1; FS[0]=/opt/cmconcl/statedir; FS_MOUNT_OPT[0]="-o rw"; FS_UMOUNT_OPT[0]=""; FS_FSCK_OPT[0]=""; FS_TYPE[0]="vxfs" 5. 6. Distribute the monitor package control file to all nodes in the recovery cluster. Apply the monitor package configuration. Editing the Continentalclusters Configuration File First, on one cluster, generate an ASCII configuration template file using the cmqueryconcl command. The recommended name and location for this file is /etc/ cmcluster/cmconcl.config. (If preferred, choose a different name.) Example: # cd /etc/cmcluster # cmqueryconcl -C cmconcl.config This file has three editable sections: • Cluster information • Recovery groups • Monitoring definitions Customize each section according to your needs. The following are some guidelines for editing each section. Editing Section 1—Cluster Information Enter cluster-level information as follows in this section of the file: 1. Enter a name for the continental cluster on the line that contains the CONTINENTAL_CLUSTER_NAME keyword. Choose any name, but it cannot be easily changed after the configuration is applied. To change the name, it is required to first delete the existing configuration as described in “Renaming a Continental Cluster” (page 115). Continentalclusters provides an optional maintenance feature for recovery groups. This feature is enabled by configuring an absolute path to a file system for the Building the Continentalclusters Configuration 71 CONTINENTAL_CLUSTER_STATE_DIR parameter. If this feature is not required, this parameter can be omitted. 2. 3. 4. Enter the name of the first cluster after the first CLUSTER_NAME keyword followed by the names of all the nodes within the first cluster. Use a separate NODE_NAME keyword and HP-UX host name for each node. Enter the domain name of the cluster’s nodes following the DOMAIN_NAME keyword. Optionally, enter the name of the monitor package on the first cluster after the MONITOR_PACKAGE_NAME keyword and the interval at which monitoring by this package will take place (minutes and/or seconds) following the MONITOR_INTERVAL keyword. The monitor interval defines how long it can take for Continentalclusters to detect that a cluster is in a certain state. The default interval is 60 seconds, but the optimal setting depends on your system’s performance. Setting this interval too low can result in the monitor’s falsely reporting an Unreachable or Error state. If this is observed during testing, use a larger value. It is suggested to use the name “ccmonpkg” for all Continentalclusters monitors. Create this package on each cluster containing a recovery package. If it is not desired to monitor a cluster, which does not containing a recovery package, it is required to delete or comment out the MONITOR_PACKAGE_NAME line and the MONITOR_INTERVAL line. For mutual recovery, create the monitor package on both the first and second clusters. NOTE: Monitoring of a cluster not containing recovery packages is optional. For example, set up monitoring of such a cluster to be able to check the status of the data replication technology being used. 5. Repeat steps 2 through 4 for the other participating cluster or clusters. NOTE: The monitor package is sensitive to system time and date. If you change the system time or date either backwards or forwards on the node where the monitor is running, notifications of alerts and alarms may be sent at incorrect times. A printout of Section 1 of the Continentalclusters ASCII configuration file follows. ################################################################ #### #### CONTINENTAL CLUSTER CONFIGURATION FILE #### #### #### #### This file contains Continentalclusters #### #### #### #### configuration data. #### #### #### #### The file is divided into three sections, #### #### #### #### as follows: #### #### #### #### 1. Cluster Information #### #### #### #### 2. Recovery Groups #### #### #### #### 3. Events, Alerts, Alarms, and #### #### #### #### Notifications #### #### 72 Designing a Continental Cluster #### #### #### #### #### #### For complete details about how to set the #### #### #### #### parameters in this file, consult the #### #### #### #### cmqueryconcl(1m) manpage or your manual. #### #### ################################################################ #### #### Section 1. Cluster Information #### #### #### #### This section contains the name of the #### #### #### #### continental cluster,name of the state #### #### #### #### directory, followed by the names of member #### #### #### #### clusters and all their nodes.The #### #### #### #### continental cluster name can be any string #### #### #### #### you choose, up to 40 characters in length. #### #### #### #### The continentalclusters state directory #### #### #### #### must be string containing the directory #### #### #### #### location. The state directory must be #### #### #### #### always an absolute path. The state #### #### #### #### directory should be created on a shared #### #### #### #### disk in the recovery cluster. This #### #### #### #### parameter is optional, if maintenance mode #### #### #### #### feature recovery groups is not required. #### #### #### #### This parameter is mandatory, if maintenance #### #### #### #### mode feature for recovery groups is #### #### #### #### required. #### #### #### #### Each member cluster name must be the same #### #### #### #### as it appears in the MC/ServiceGuard cluster ######## #### #### configuration ASCII file for that cluster. #### #### #### #### In addition to the cluster name, include a #### #### #### #### domain name for the nodes in the cluster. #### #### #### #### Node Names must be the same as those that #### #### #### #### appear in the cluster configuration ASCII #### #### #### #### file. A minimum of two member cluster needs #### #### #### #### to be specified. You may configure one #### #### #### #### cluster to serve as recovery cluster for #### #### #### #### one or more other clusters. #### #### #### #### #### #### #### #### In the space below, enter the continental #### #### #### #### cluster name, then enter a cluster name for #### #### #### #### each member cluster, followed by the names #### #### #### #### of all the nodes in that cluster.Following #### #### #### #### the node names, enter the name of a monitor #### #### #### #### package that will run the continental #### #### #### #### cluster monitoring software on that cluster.#### #### #### #### It is strongly recommended that you use the #### #### #### #### same name for the monitoring package on all #### #### #### #### clusters; "ccmonpkg" is suggested. #### #### #### #### Monitoring of the recovery cluster by the #### #### #### #### primary cluster is optional. If you do not #### #### #### #### wish to monitor the recovery cluster, you #### #### #### #### must delete or comment out the #### #### #### #### MONITOR_PACKAGE_NAME and MONITOR_INTERVAL #### #### #### #### lines that follow the name of the primary #### #### #### #### cluster. #### #### Building the Continentalclusters Configuration 73 #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### After the monitor package name, enter a monitor interval,specifying a number of minutes and/or seconds. The default is 60 seconds, the minimum is 30 seconds, and the maximum is 5 minutes. #### #### #### #### #### #### CLUSTER_NAME westcoast #### CLUSTER_DOMAIN westnet.myco.com #### NODE_NAME system1 #### NODE_NAME system2 #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS#### #### #### CLUSTER_NAME eastcoast #### CLUSTER_DOMAIN eastnet.myco.com #### NODE_NAME system3 #### NODE_NAME system4 #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS #### #### CONTINENTAL_CLUSTER_NAME ccluster1 #### CONTINENTAL_CLUSTER_STATE_DIR #### CLUSTER_NAME #### CLUSTER_DOMAIN #### NODE_NAME #### NODE_NAME #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 60 SECONDS #### CLUSTER_NAME #### CLUSTER_DOMAIN #### NODE_NAME #### NODE_NAME #### MONITOR_PACKAGE_NAME ccmonpkg #### MONITOR_INTERVAL 60 SECONDS #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### Editing Section 2 – Recovery Groups In this section of the file, define recovery groups, which are sets of Serviceguard packages that are ready to recover applications in case of cluster failure. Create a separate recovery group for each package that will be started on a cluster when the cmrecovercl(1m) command is issued on that cluster. Examples of recovery groups are shown graphically in Figure 2-7 and Figure 2-8. 74 Designing a Continental Cluster Figure 2-7 Sample Continentalclusters Recovery Groups New York Cluster Recovery Group for Sales Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE NYnode1 Sales LAcluster/salespkg NYcluster/salespkg_bak Los Angeles Cluster LAnode1 NYnode2 salespkg_bak.config custpkg_bak.conf salespkg_bak.cntl custpkg_bak.cntl WAN LAnode2 salespkg.config custpkg.config salespkg.cntl custpkg.cntl Recovery Group for Customer Application: RECOVERY_GROUP_NAME Customer PRIMARY_PACKAGE LAcluster/custpkg RECOVERY_PACKAGE NYcluster/custpkg_bak Building the Continentalclusters Configuration 75 Figure 2-8 Sample Bi-directional Recovery Groups New York Cluster Recovery Group for Sales Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE Sales LAcluster/salespkg NYcluster/salespkg_bak NYnode1 NYnode2 salespkg_bak.config salespkg_bak.cntl custpkg.cntl WAN Los Angeles Cluster LAnode1 custpkg.config LAnode2 salespkg.config custpkg_bak.conf salespkg.cntl custpkg_bak.cntl Recovery Group for Customer Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE Customer NYcluster/custpkg LAcluster/custpkg_bak Enter data in Section 2 as follows: 1. 2. Enter a name for the recovery group following the RECOVERY_GROUP_NAME keyword. This can be any name you choose. After the PRIMARY_PACKAGE keyword, enter a primary package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example: PRIMARY_PACKAGE LAcluster/custpkg 3. 76 Optionally, enter a data sender package definition consisting of the cluster name, a slash (/), and the data sender package name after the DATA_SENDER_PACKAGE keyword. This is only necessary if you are using a logical data replication method that requires a data sender package. Designing a Continental Cluster 4. After the RECOVERY_PACKAGE keyword, enter a recovery package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example: RECOVERY_PACKAGE NYcluster/custpkg_bak 5. 6. 7. Optionally, enter a data receiver package definition consisting of the cluster name, a slash (/), and the data receiver package name after the DATA_RECEIVER_PACKAGE keyword. This is only necessary if using a logical data replication method that requires a data receiver package. Optionally, enter a rehearsal package definition consisting of the cluster name, a slash (/), and the rehearsal package name after the REHEARSAL_PACKAGE keyword. This is only required for performing a rehearsal operation at the recovery cluster. Repeat these steps for each package that will be recovered. Each package must be configured in a separate recovery group. A printout of Section 2 of the Continentalclusters ASCII configuration file follows. ############################################################### #### #### Section 2. Recovery Groups #### #### #### #### This section defines recovery groups--sets #### #### #### #### of ServiceGuard packages that are ready to #### #### #### #### recover applications in case of cluster #### #### #### #### failure. Recovery groups allow one cluster #### #### #### #### in the continental cluster configuration to #### #### #### #### back up another member cluster's packages. #### #### #### #### You create a separate recovery group for #### #### #### #### each ServiceGuard package that will be #### #### #### #### started on the recovery cluster when the #### #### #### #### cmrecovercl(1m) command is issued. #### #### #### #### #### #### #### #### A recovery group consists of a primary #### #### #### #### package running on one cluster, a recovery #### #### #### #### package that is ready to run on a different #### #### #### #### cluster. In some cases, a data receiver #### #### #### #### package runs on the same cluster as the #### #### #### #### recovery package, and in some cases, a data #### #### #### #### sender package runs on the same cluster #### #### #### #### as the primary package.For rehearsal #### #### #### #### operations a rehearsal package forms a part #### #### #### #### of the recovery group. The rehearsal package #### #### #### #### is configured always in the recovery cluster.#### #### #### #### During normal operation, the primary package #### #### #### #### is running an application program on the #### #### #### #### primary cluster, and the recovery package, #### #### #### #### which is configured to run the same #### #### #### #### application, is idle on the recovery cluster.#### #### #### #### If the primary package performs disk I/O, #### #### #### #### the data that is written to disk is #### #### #### #### replicated and made available for possible #### #### #### #### use on the recovery cluster. #### #### Building the Continentalclusters Configuration 77 #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 78 #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### For some data replication techniques, this #### #### involves the use of a data receiver package #### #### running on the recovery cluster. #### #### In the event of a major failure on the #### #### primary cluster, the user issues the #### #### cmrecovercl(1m) command to halt any data #### #### receiver packages and start up all the #### #### recovery packages that exist on the #### #### recovery cluster. #### #### During rehearsal operation, before starting #### #### the rehearsal packages,care should be taken #### #### that the replication between the primary and #### #### the recovery sites is suspended. For some #### #### data replication techniques which involve #### #### the use of a data receiver package, #### #### rehearsal operations must be commenced only #### #### after shutting down the data receiver #### #### package at the recovery cluster. Rehearsal #### #### packages are started using the #### #### cmrecovercl -r command. #### #### Enter the name of each package recovery #### #### group together with the fully qualified #### #### names of the primary and recovery packages. #### #### If appropriate, enter the fully qualified #### #### name of a data receiver package. Note that #### #### the data receiver package must be on the #### #### same cluster as the recovery package. #### #### The primary package name includes the #### #### primary cluster name followed by a slash #### #### ("/") followed by the package name on the #### #### primary cluster. The recovery package name #### #### includes the recovery cluster name, followed #### #### by a slash ("/")followed by the package name #### #### on the recovery cluster. #### #### #### #### The data receiver package name includes the #### #### recovery cluster name, followed by a slash #### #### ("/") followed by the name of the data #### #### receiver package on the recovery cluster. #### #### The rehearsal package name includes the #### #### recovery cluster name, followed by a slash #### #### ("/"). #### #### Up to 29 recovery groups can be entered. #### #### #### #### Example: #### #### RECOVERY_GROUP_NAME nfsgroup #### #### PRIMARY_PACKAGE westcoast/nfspkg #### #### DATA_SENDER_PACKAGE westcoast/nfssenderpkg #### #### RECOVERY_PACKAGE eastcoast/nfsbackuppkg #### #### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg#### #### REHEARSAL_PACKAGE eastcoast/nfsrehearsalpkg #### #### #### #### Designing a Continental Cluster #### #### #### #### #### #### #### #### #### #### #### #### RECOVERY_GROUP_NAME hpgroup #### PRIMARY_PACKAGE westcoast/hppkg #### DATA_SENDER_PACKAGE westcoast/hpsenderpkg #### RECOVERY_PACKAGE eastcoast/hpbackuppkg #### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg#### REHEARSAL_PACKAGE eastcoast/hprehearsalpkg #### #### #### #### #### #### #### Editing Section 3—Monitoring Definitions Finally, enter monitoring definitions that define cluster events and set times at which alert and alarm notifications are to be sent out. Define notifications for all cluster events—Unreachable, Down, Up, and Error. Although it is impossible to make specific recommendations for every Continentalclusters environment, here are a few general guidelines about notifications. 1. Specify the cluster event by using the CLUSTER_EVENT keyword followed by the name of the cluster, a slash (“/”) and the name of the status—Unreachable, Down, Up, or Error. Example: CLUSTER_EVENT LAcluster/UNREACHABLE 2. 3. 4. Define a CLUSTER_ALERT at appropriate times following the appearance of the event. Specify the elapsed time and include a NOTIFICATION message that provides useful information about the event. Create as many alerts as needed, and send as many notifications as needed to different destinations (see the comments in the file excerpt below for a list of destination types). Note that the message text in the notification must be on a separate line in the file. If the event is for a cluster in an Unreachable condition, define a CLUSTER_ALARM at appropriate times. Specify the elapsed time since the appearance of the event (greater than the time used for the last CLUSTER_ALERT), and include a NOTIFICATION message that indicates what action should be taken. Create as many alarms as needed, and send as many notifications as needed to different destinations (see the comments in the file excerpt below for a list of destination types). If using a monitor on a cluster containing no recovery packages, define alerts for the monitoring of Up, Down, Unreachable, and Error states on the recovery cluster. It is not necessary to define alarms. A printout of Section 3 of the Continentalclusters ASCII configuration file follows. ################################################################ #### #### Section 3. Monitoring Definitions #### #### #### #### This section of the file contains monitoring #### #### #### #### definitions. Well planned monitoring #### #### #### #### definitions will help in making the decision #### #### #### #### whether or not to issue the cmrecovercl(1m) #### #### #### #### command. Each monitoring definition specifies#### #### #### #### a cluster event along with the messages #### #### #### #### that should be sent to system administrators #### #### #### #### or other IT staff. #### #### Building the Continentalclusters Configuration 79 #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 80 #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### All messages are appended to the default log #### #### /var/opt/resmon/log/cc/eventlog as well as to#### #### the destination you specify below. #### #### A cluster event takes place when a monitor #### #### that is located on one cluster detects a #### #### significant change in the condition of #### #### another cluster. The monitored cluster #### #### conditions are: #### #### UNREACHABLE - the cluster is unreachable. #### #### This will occur when the communication link #### #### to the cluster has gone down, as in a WAN #### #### failure, or when the all nodes in the #### #### cluster have failed. #### #### DOWN - the cluster is down but nodes are #### #### responding. This will occur when the cluster #### #### is halted, but some or all of the member #### #### nodes are booted and communicating with the #### #### monitoring cluster. #### #### UP - the cluster is up. #### #### ERROR - there is a mismatch of cluster #### #### versions or a security error. #### #### A change from one of these conditions to #### #### another one is a cluster event. You can #### #### define alert or alarm states based on the #### #### length of time since the cluster event was #### #### observed. Some events are noteworthy at the #### #### time they occur, and some are noteworthy #### #### when they persist over time. Setting the #### #### elapsed time to zero results in a message #### #### being sent as soon as the event takes place. #### #### Setting the elaspsed time to 5 minutes results#### #### in a message being sent when the condition #### #### has persisted for 5 minutes. #### #### An alert is intended as informational only. #### #### Alerts may be sent for any type of cluster #### #### condition. For an alert, a notification is #### #### sent to a system administrator or other #### #### destination. Alerts are not intended to #### #### indicate the need for recovery. The #### #### cmrecovercl(1m) command is disabled. #### #### #### #### An alarm is an indication that a condition #### exists that may require recovery. For an #### alarm, a notification is sent, and in #### addition, the cmrecovercl(1m) command is #### enabled for immediate execution, allowing #### the administrator to carry out cluster #### recovery. An alarm can only be defined for #### an UNREACHABLE or DOWN condition in the #### monitored cluster. #### A notification defines a message that is #### appended to the log file #### Designing a Continental Cluster #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### /var/opt/resmon/log/cc/eventlog and sent to other specified destinations, including email addresses, SNMP traps, the system console, or the syslog file. The message string in a notification can be no more than 170 characters. Enter notifications in one of the following forms: NOTIFICATION CONSOLE Message written to the console. #### #### #### #### #### #### #### #### #### #### #### NOTIFICATION EMAIL
#### #### Message emailed to a fully qualified email #### address. #### ##### NOTIFICATION OPC #### #### The is sent to OpenView IT/Operations)#### The value of may be 8 (normal), #### 16 (warning), 64 (minor), 128 (major),32 #### (critical). #### NOTIFICATION SNMP #### #### The is sent as an SNMP trap. #### The value of may be 1 (normal), #### 2 (warning), 3 (minor), 4 (major),5 (critical). #### NOTIFICATION SYSLOG #### #### A notice of the event is appended to the syslog #### file. #### #### NOTIFICATION TCP : ##### #### Message is sent to a TCP port on the specified #### node. #### #### NOTIFICATION TEXTLOG #### #### A notice of the event is written to a user#### specified log file. must be a full #### path for the user-specified file. The user #### specified file must be under /var/opt/resmon/log #### directory. #### NOTIFICATION UDP : #### #### Message is sent to a UDP port on the specified #### node. #### For the cluster event, enter a cluster name #### followed by a slash ("/") and a cluster condition #### (UP, DOWN, UNREACHABLE,ERROR) that may be detected #### by a monitor program. #### Building the Continentalclusters Configuration 81 #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 82 #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ##### Each cluster event must be paired with a #### monitoring cluster. Include the name of the #### cluster on which the monitoring will take place. #### Events can be monitored from either the primary ##### cluster or the recovery cluster. #### #### Alerts, alarms, and notifications have the #### following syntax. #### #### CLUSTER_ALERT MINUTES SECONDS #### Delay before the software issues an alert #### notification about the cluster event. #### #### CLUSTER_ALARM MINUTES SECONDS #### Delay before the software issues an alarm #### notification about the cluster event and #### enables the cmrecovercl(1m) command for #### immediate execution. #### NOTIFICATION #### #### A string value which is sent from the monitoring #### cluster for a given event to a specified #### destination. The , which can be no more #### than 170 characters, is also appended to the #### /var/opt/resmon/log/cc/eventlog file on the #### monitoring node in the cluster where the event #### was detected. #### #### #### Example: #### #### CLUSTER_EVENT westcoast/UNREACHABLE #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 5 MINUTES #### NOTIFICATION EMAIL [email protected] #### "westcoast status unknown for 5 min. Call #### secondary site." #### NOTIFICATION EMAIL [email protected] #### "Call primary admin. (555) 555-6666." #### #### CLUSTER_ALERT 10 MINUTES #### NOTIFICATION EMAIL [email protected] #### "westcoast status unknown for 10 min. Call #### secondary site." #### NOTIFICATION EMAIL [email protected] #### "Call primary admin. (555) 555-6666." #### NOTIFICATION CONSOLE #### "Cluster ALERT: westcoast not responding." #### #### CLUSTER_ALARM 15 MINUTES #### NOTIFICATION EMAIL [email protected] #### Designing a Continental Cluster #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### "westcoast status unknown for 15 min. Takeover #### advised." #### NOTIFICATION EMAIL [email protected] #### "westcoast still not responding. Use #### cmrecovercl command." #### NOTIFICATION CONSOLE #### "Cluster ALARM: Issue cmrecovercl command to take #### over "westcoast." #### #### CLUSTER_EVENT westcoast/UP #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 0 MINUTES #### NOTIFICATION EMAIL [email protected] #### "Cluster westcoast is up." #### #### CLUSTER_EVENT westcoast/DOWN #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 0 MINUTES #### NOTIFICATION EMAIL [email protected] #### "Cluster westcoast is down." #### #### CLUSTER_EVENT westcoast/ERROR #### MONITORING_CLUSTER eastcoast #### CLUSTER_ALERT 0 MINUTES #### NOTIFICATION EMAIL [email protected] #### "Error in monitoring cluster westcoast." #### #### CLUSTER_EVENT /UNREACHABLE #### MONITORING_CLUSTER CLUSTER_ALERT #### The TEXTLOG notification file should be placed under the /var/opt/resmon/log directory. If any other directory is specified, an error is reported by the cmapplyconcl and cmcheckconcl commands. If you specify any other location for logging, the following error message appears: The target after textlog “ ” is not valid. Please specify a file under /var/opt/resmon/log directory If you upgraded Continentalclusters but are still using the old configuration file, the textlog location is still specified as /var/adm/cmconcl. As a result, the following error message appears: The file path “s” specified for textlog is invalid. The destination file must be under /var/opt/resmon/log directory. Please change the path and restart the ccmon package. Building the Continentalclusters Configuration 83 IMPORTANT: For TEXTLOG notification, the destination log file must be in the /var/ opt/resmon/log directory. If the destination file is not available in this directory, Continentalclusters will not work properly. Selecting Notification Intervals The monitor interval determines the amount of time between distinct attempts by the monitor to obtain the status of a cluster. The intervals associated with notifications need to be chosen to work in combination with the monitor interval to give a realistic picture of cluster events. Some combinations are not useful. For example, notification intervals that are smaller than the monitor interval do not make sense, and should be avoided. In the following example, the cluster event will always result in two alerts followed by an alarm. No change of state could possibly be detected at the one-minute, two-minute and three-minute intervals, because the monitor does not check for changes until the monitor interval (5 minutes) has been reached. MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 5 MINUTES... CLUSTER_EVENT LACluster/UNREACHABLE CLUSTER_ALERT 1 MINUTE NOTIFICATION CONSOLE "1 Minute Alert: LACluster Unreachable" CLUSTER_ALERT 2 MINUTES NOTIFICATION CONSOLE "2 Minute Alert: LACLuster Still Unreachable" CLUSTER_ALRAM 3 MINUTES NOTIFICATION CONSOLE "ALARM: LACluster Unreachable after 3 Minutes: Recovery Enabled" The following sequence could provide meaningful notifications, since a change of state is possible between notification intervals: MONITOR_PACAGE_NAME ccmonpkg MONITOR_INTERVAL 1 minute ... CLUSTER_EVENT LACluster/UNREACHABLE CLUSTER_ALERT 3 MINUTES NOTIFCATION CONSOLE "5 Minute Alert: LACluster Still Unreachable CLUSTER_ALARM 10 MINUTES NOTIFICATION CONSOLE "ALARM: LACluster Unreachable after 10 Minutes: Recovery Enabled" NOTE: The notification intervals should be multiples of the monitor interval. The following is a sample Continentalclusters configuration file with two recovery pairs. Both cluster1 and cluster2 are configured to have cluster3 as their 84 Designing a Continental Cluster recovery cluster for package pkg1 and pkg2, and cluster3 is configured to have cluster1 as its recovery cluster for pkg3. # Section1: Cluster Information CONTINENTAL_CLUSTER_NAME sampleCluster CONTINENTAL_CLUSTER_STATE_DIR /opt/cmconcl/statedir CLUSTER_NAME cluster1 CLUSTER_DOMAIN cup.hp.com NODE_NAME node11 NODE_NAME node12 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 seconds CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME cluster2 cup.hp.com node21 node22 CLUSTER_NAME cluster3 CLUSTER_DOMAIN cup.hp.com NODE_NAME node31 NODE_NAME node32 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 seconds RECOVERY_GROUP_NAME ccRG1 PRIMARY_PACKAGE cluster1/pkg1 RECOVERY_PACKAGE cluster3/pkg1’ REHEARSAL_PACKAGE cluster3/pkg4’ RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE RECOVERY_GROUP_NAME RECOVERY_PACKAGE DATA_RECEIVER_PACKAGE ccRG2 cluster2/pkg2 cluster3/pkg2’ ccRG3 cluster3/pkg3 cluster1/pkg3’ # Section 3. Monitoring Definitions #### CLUSTER_EVENT cluster1/DOWN MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster1 DOWN alert” CLUSTER_ALARM 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alarm” NOTIFICATION SYSLOG Building the Continentalclusters Configuration 85 “DRT: (Ora-test) cluster1 DOWN alarm” CLUSTER_EVENT cluster2/DOWN MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 DOWN alert” CLUSTER_ALARM 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alarm” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 DOWN alarm” CLUSTER_EVENT cluster3/DOWN MONITORING_CLUSTER cluster1 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/logging “DRT: (Ora-test) DOWN alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster3 DOWN alert” CLUSTER_ALARM 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alarm” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster3 DOWN alarm” CLUSTER_EVENT cluster1/UP MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) UP alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster1 UP alert” CLUSTER_EVENT cluster2/UP MONITORING_CLUSTER cluster3 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) UP alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 UP alert” CLUSTER_EVENT cluster3/UP MONITORING_CLUSTER cluster1 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) UP alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster3 UP alert” 86 Designing a Continental Cluster Checking and Applying the Continentalclusters Configuration After editing the configuration file on any of the participating clusters in the Continentalcluster, halt any monitor packages that are running, then use the following steps to apply the configuration to all nodes in the continental cluster. 1. Verify the content of the file. # cmcheckconcl -v -C cmconcl.config This command will verify that all parameters are within range, all fields are filled out, and the entries (such as NODE_NAME) are valid. 2. Distribute the Continentalclusters configuration information to all nodes in the continental cluster. # cmapplyconcl -v -C cmconcl.config Configuration data is copied to all nodes and in all the participating clusters. This data includes a set of managed object files that are copied to the /ec/cmconcl/ instances directory on every node in all clusters. 3. Be sure to make a backup copy of the configuration ascii file and save it on the other cluster after it is applied. NOTE: If any problems occur during the execution of cmapplyconcl, repeat the command as often as necessary. Issuing the command will delete the existing Continentalclusters configuration and apply the new one. When configuration is finished, your systems should have sets of files similar to those shown in Figure 2-9. Building the Continentalclusters Configuration 87 Figure 2-9 Continentalclusters Configuration Files New York Cluster NYnode1 recovery package files salespkg_bak.config salespkg_bak.cntl custpkg_bak.config custpkg_bak.cntl contclust config file cmconcl.config contclust config file cmconcl.config contclust monitor pkg ccmonpkg.config ccmonpkg.cntl contclust monitor pkg ccmonpkg.config ccmonpkg.cntl managed object files /etc/cmconcl/instances/* Los Angeles Cluster NYnode2 recovery package files salespkg_bak.config salespkg_bak.cntl custpkg_bak.config custpkg_bak.cntl managed object files /etc/cmconcl/instances/* WAN LAnode2 LAnode1 primary package files salespkg.config salespkg.cntl custpkg.config custpkg.cntl primary package files salespkg.config salespkg.cntl custpkg.config custpkg.cntl contclust config file cmconcl.config contclust config file cmconcl.config contclust monitor pkg ccmonpkg.conf ccmonpkg.cntl contclust monitor pkg ccmonpkg.config ccmonpkg.cntl managed object files /etc/cmconcl/instances/* managed object files /etc/cmconcl/instances/* Starting the Continentalclusters Monitor Package Starting the monitoring package enables all Continentalclusters monitoring functionality. Before doing this, ensure that the primary packages selected to be protected are running normally and that data sender and receiver packages, if they are being used for logical data replication, are working properly. If using physical data replication, make sure that it is operational. On each monitoring cluster start the monitor package. # cmmodpkg -e ccmonpkg After the monitor package is started, a log file /var/adm/cmconcl/sentryd.log will be created on the node where the package is running to record the 88 Designing a Continental Cluster Continentalclusters monitoring activities. It is recommended that this log file be archived or cleaned up periodically. Validating the Configuration The following table shows the status of Continentalclusters packages in a recovery pair when each cluster is running normally and no recovery has taken place. Table 2-6 Status of Continentalclusters Packages Before Recovery Primary Cluster Recovery Cluster Data Replication Method Primary Package Data Sender Package Optional Monitor Package Recovery Package Data Receiver Package Required Monitor Package Physical— Symmetrix Running Not used Running (optional) Halted Not used Running (required) Physical— XP Running Series Not used Running (optional) Halted Not used Running (required) Physical—EVA Running Series Not used Running (optional) Halted Not used Running (required) Logical— Oracle Standby Database Not used Running (optional) Halted Running Running (required) Running Use the following steps to ensure the components are functioning correctly: 1. Make sure all daemons are running. # ps -ef | grep cmcl Two important Continentalclusters daemons are cmclsentryd and cmclrmond. 2. Check the cluster configuration on each cluster using the cmviewcl -v command. a. Ensure that each primary package is running correctly. b. Ensure that the data sender packages (if any are used for logical data replication) are running correctly. c. Ensure that the data receiver packages (if any are used for logical data replication) are running correctly. d. Ensure that the continental cluster monitor package is running correctly on each monitoring cluster. 3. On all nodes, use the tail -f /adm/syslog/syslog.log command to check the end of the SYSLOG file for errors. On nodes where packages are running, check all package log files for errors, including application packages and the monitor package. 4. Building the Continentalclusters Configuration 89 5. Use the following command to verify the correct operation of the Continentalclusters daemon: # /opt/cmom/tools/bin/cmreadlog -f \/var/adm/cmconcl/sentryd.log 6. 7. 8. Make sure the Continentalclusters monitor packages (default name ccmonpkg) on each cluster fails over properly if a node fails. Change each cluster’s state to test that the monitor running on the monitoring cluster will detect the change in status and send notification. View the status of the Continentalclusters primary and recovery clusters, including configured event data. # cmviewconcl -v CAUTION: Never issue the cmrunpkg command for a recovery package when Continentalclusters is enabled, because there is no guaranteed way of preventing a package that is running on one cluster from running on the other cluster if the package is started using this command. The potential for data corruption is great. Chapters 3, 4 and 5 contain additional suggestions on testing the data replication and package configuration. Documenting the Recovery Procedure Once everything is configured and the Continentalclusters monitor is running, it is necessary to define your recovery procedure and train the administrators and operators at both sites. The checklist in Figure 2-10 is an example of to document the recovery procedure. 90 Designing a Continental Cluster Figure 2-10 Recovery Checklist Identify the level of alert that the monitoring site received. Cluster Alert Cluster Alarm Contact the monitored site by phone or beeper to rule out the following: WAN networking failure, primary cluster and packages are still fine. Cluster and/or package have come back up but UP notification not yet received by recovery site. Get authorization from the monitored site using one of the following: Authorized person contacted: Director 1 Admin 1 Authorization received: Human-to-human voice authorization Voice mail Notify the monitored site of successful recovery using one of the following: Authorized person contacted: Director 1 Admin 1 Confirmation received: Human-to-human voice confirmation Voice mail Reviewing the Recovery Procedure Using the checklist described in the previous section, step through the recovery procedure to make sure that all necessary steps are included. If possible, create simulated failures to test the alert and alarm scenarios coded in the Continentalclusters configuration file. Testing the Continental Cluster This section presents some test procedures and scenarios. Some scenarios presume certain configurations that may not apply to all environments. Additionally, these tests Testing the Continental Cluster 91 do not eliminate the need to perform standard Serviceguard testing for each cluster individually. CAUTION: Data and system corruption can occur as a result of testing. System and data backups should always be done prior to testing. Testing Individual Packages Use procedures like the following to test individual packages: 1. 2. 3. 4. 5. 6. 7. 8. Use the cmhaltpkg command to shut down the package on the primary cluster that corresponds to the package to be tested on the recovery cluster. Do not switch any users to the recovery cluster. The application must be inaccessible to users during this test. Start up the package to be tested on the recovery cluster using the cmrunpkg command. Access the application manually using a mechanism that tests network connectivity. Perform read-only actions to verify that the application is running appropriately. Shut down the application on the recovery cluster using the cmhaltpkg command. If using physical data replication, do not resync from the recovery cluster to the primary cluster. Instead, manually issue a command that will overwrite any changes on the recovery disk array that may inadvertently have been made. Start the package up on the primary cluster and allow connection to the application. Testing Continentalclusters Operations Use the following procedures to exercise typical Continentalclusters behaviors: 1. Halt both clusters in a recovery pair, then restart both clusters. The monitor packages on both clusters should start automatically. The Continentalclusters packages (primary, data sender, data receiver, and recovery) should not start automatically. Any other packages may or may not start automatically, subject to their configuration. NOTE: If an UP status is configured for a cluster, then an appropriate alert notification (email, SNMP, etc.) should be received at the configured time interval from the node running the monitor package on the other cluster. Due to delays in email or SNMP, the notifications may arrive later than expected. In addition to alerts/alarms sent using the mechanisms defined in the Continentalclusters configuration file, they are also recorded in the file /var/opt/ resmon/log/cc/eventlog on the system reporting the event. 2. 92 While the monitor package is running on a monitoring cluster, halt the monitored cluster (cmhaltcl -f). An appropriate alert notification (email, SNMP, etc.) should be received at the configured time interval from the node running the Designing a Continental Cluster 3. monitor package. Run cmrecovercl. The command should fail. Additional notifications should be received at the configured time intervals. After the alarm notification is received, run cmrecovercl. Any data receiver packages on the monitoring cluster should halt and the recovery package(s) should start with package switching enabled. Halt the recovery packages. Test 2 should be rerun under a variety of conditions (and multiple conditions) such as the following: • Rebooting and powering off systems one at a time • Rebooting and powering off all systems at the same time — Running the monitor package on each node in each cluster — Disconnecting the WAN connection between the clusters If physical data replication is used disconnect the physical replication links between the disk arrays: — Powering off the disk array at the primary site — Powering off the disk array at the recovery site • Testing cmrecovercl -f as well as cmrecovercl Depending on the condition, the primary packages should be running to test real life failures and recovery procedures. 4. 5. 6. After each scenario in tests 2-4, restore both clusters to their production state, restart the primary package(s) (as well as any data sender and data receiver packages) and note any issues, time delays, etc. Halt the monitor package on one cluster. Halt the other cluster. No notifications are generated that the other cluster has failed. What mechanism is available to the organization to monitor the monitor? Halt the packages on one cluster, but do not halt the cluster. No notifications are generated that the packages on that cluster have failed. What mechanism is available to the organization to monitor package status? NOTE: 7. Continentalclusters monitors cluster status, but not package status. View the status of the continental cluster. # cmviewconcl Switching to the Recovery Packages in Case of Disaster Once the clusters are configured and tested, packages will be able to fail over to an alternate node in another data center and still have access to the data they need to function. The primary steps for failing over a package are: 1. 2. Receive notification that a monitored cluster is unavailable. Verify that it is necessary and safe to start the recovery packages. Switching to the Recovery Packages in Case of Disaster 93 3. 4. Use the recovery command to stop data replication and start recovery packages. View the status of the continental cluster. # cmviewconcl It is important to have a well-defined recovery process, and that all members at both sites are educated on how to use this process. Receiving Notification Once the monitor is started, as described in “Starting the Continentalclusters Monitor Package” (page 88), the monitor will send notifications as configured. The following types of notifications are generated as configured in cmclconf.ascii: • • CLUSTER_ALERT is a change in the status of a cluster. Recovery via the cmrecovercl command is not enabled by default. This should be treated as information that the cluster either may be developing a problem or may be recovering from a problem. CLUSTER_ALARM is a change in the status of a cluster that indicates that the cluster has been unavailable for an unacceptable amount of time. Recovery via the cmrecovercl command is enabled. The issuing of notifications takes place at the timing intervals specified for each cluster event. However, it sometimes may appear that an alert or alarm takes longer than configured. Keep in mind that if several changes of cluster state (for example, Down to Error to Unreachable to Down) take place in a smaller time than the configured interval for an alert or alarm, the timer is reset to 0 after each change of state; thus, the time to the alert or alarm will be the configured interval plus the time used by all the earlier state changes. NOTE: The cmrecovercl command is fully enabled only after a CLUSTER_ALARM is issued; however, the command may be used with the -f option when a CLUSTER_ALERT has been issued. Verifying that Recovery is Needed It is important to follow the established protocol for coordinating with the remote site to determine whether moving the package is required. This includes initiating person-to-person communication between sites. For example, it may be possible that the WAN network failed, causing the cluster alarm. Some network failures, such as those that prevent clients from using the application, may require recovery. Other network failures, such as those that only prevent the two clusters from communicating, may not require recovery. Following an established protocol for communicating with the remote site would verify this. See Figure 2-10 (page 91) for an example of a recovery checklist. 94 Designing a Continental Cluster Using the Recovery Command to Switch All Packages If other types of data replication technology are chosen other than Metrocluster Continuous Access XP, or Metrocluster Continuous Access EVA, or Metrocluster with EMC SRDF, use the following steps prior to executing the Continentalclusters recovery command, cmrecovercl. Once notification is received and there is coordination between the sites in a recovery pair, (For a sample worksheet, see “Documenting the Recovery Procedure” (page 90)), and have determined that moving the package is necessary: • • • Check to make sure the data used by the application is in usable state. Usable state means the data is consistent and recoverable, even though it may not be current. Check to make sure the secondary devices are in read-write mode. If you are using database or software data replication make sure the data copy at the recovery site is in read-write mode as well. If LVM and physical data replication are used, the ID of the primary cluster is also replicated and written on the secondary devices in the recovery site. The ID of the primary cluster must be cleared and the ID of the recovery cluster must be written on the secondary devices before they can be used. If LVM exclusive-mode is used, issue the following commands from a node in the recovery cluster on all the volume groups that are used by the recovery packages: # vgchange -c n # vgchange -c y If LVM shared-mode (SLVM) is used, from a node in the recovery cluster, issue the following commands: # vgchange -c n -S n # vgchange -c y -S y • If VxVM and physical data replication are used, the host name of a node in the primary cluster is the host name of the last owner of the disk group. It is also replicated and written on the secondary devices in the recovery site. The host name of the last owner of the disk group must be cleared out before the secondary devices can be used. If VxVM is used, issue the following command from a node in the recovery cluster on all the disk groups that are used by the recovery packages: # vxdg deport To Start the Failover Process Use the following commands to start the failover process: # cmrecovercl Switching to the Recovery Packages in Case of Disaster 95 If a notification defined in a CLUSTER_ALARM statement in the configuration file is not received, but a CLUSTER_ALERT and the remote site has confirmed the need to fail over has been received, then override the disabled cmrecovercl command by using the -f forcing option: # cmrecovercl -f Use this command only after positive confirmation from the remote site. The cmrecovercl command will skip recovery for recovery groups in maintenance mode. In a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster, runningcmrecovercl without any option will attempt to recover packages for all of the recovery groups of the configured primary clusters. Recovery can also be done in this multiple recovery pair case on a per cluster basis by using option -c. # cmrecovercl -c If the monitored cluster comes back up following an alert or alarm, but it is certain that the primary packages cannot start (say, because of damage to the disks on the primary site), then use a special procedure to initiate recovery: 1. 2. 3. Use the cmhaltcl command to halt the primary cluster. Wait for the monitor to send an alert. Use cmrecovercl -f to perform recovery. After the cmrecovercl command is issued, Continentalclusters displays a warning message, such as the following and prompts for a verification that recovery should proceed (the names “LAcluster” and “NYcluster” are examples):WARNING: This command will take over for the primary cluster “LAcluster” by starting the recovery package on the recovery cluster "NYCluster.You must follow your site disaster recovery procedure to ensure that the primary packages on "LAcluster" are not running and that recovery on "NYCluster" is necessary. Continuing with this command while the applications are running on the primary cluster may result in data corruption.Are you sure that the primary packages are not running and will not come back, and are you certain that you want to start the recovery packages? [Y/N]. Reply “Y” to proceed only if you are certain that recovery should take place. After replying “Y”, a group of messages will appear as shown below. As the processing of each recovery group occurs (the message about the data receiver package appears only using logical data replication with data sender and receiver packages):Processing the recovery group nfsgroup on recovery cluster eastcoast.Disabling switching for data receiver package nfsreceiverpkg on recovery cluster eastcost.Halting data receiver package nfsreceiverpkg on recovery cluster east coast.Starting 96 Designing a Continental Cluster recovery package nfsbackuppkg on recovery cluster eastcoast.Enabling package nfsbackuppkg in cluster eastcoast.----------------exit status = 0---------------The command cmrecovercl starts up all the recovery packages that are configured in the recovery groups. The cmrecovercl -c command will skip recovery for recovery groups in maintenance mode. In addition to starting the recovery packages all at once, another option is to recover an individual recovery group by using the following command: # cmrecovercl -g Recovery_Group_Name Running the cmrecovercl with option -g starts up only the recovery package configured in the specified recovery group. The cmrecovercl -g command fails to recover if the specified recovery group is in maintenance mode. NOTE: After the cmrecovercl command is issued, there is a delay of at least 90 seconds per recovery group as the command makes sure that the package is not active on another cluster. Use the cmviewcl command on the local cluster to confirm that the recovery packages are running correctly. Following recovery, halt the package that was monitoring the remote cluster if preferred. If this is not done then notification, if there is a change in the remote cluster’s state, will continue to be received. The following table shows the status of Continentalclusters packages after recovery has taken place, and applications are now running on the local cluster. Table 2-7 Status of Continentalclusters Packages After Recovery Primary Cluster Recovery Cluster Data Replication Method Primary Package Data Sender Package Optional Monitor Package Recovery Package Data Receiver Package Required Monitor Package Physical— Symmetrix Halted Not used Halted or Running Running Not used Halted or Running Physical— XP Series Halted Not used Halted or Running Running Not used Halted or Running Physical—EVA Halted Series Not used Halted or Running Running Not used Halted or Running Logical— Halted Oracle Standby Database Not used Halted or Running Running Halted Halted or Running Switching to the Recovery Packages in Case of Disaster 97 How the cmrecovercl Command Works The cmrecovercl command uses the configuration file to loop through each defined recovery group of a target remote cluster to be recovered. For each recovery group that is not in the maintenance mode, the command communicates with the monitor package (ccmonpkg) and verifies that the remote cluster is unreachable or down, then if there is a data replication package it is halted, and the recovery package is enabled on the Recovery Cluster. The recovery package can then start up on the local cluster on the appropriate node, as determined by the FAILOVER_POLICY configured for the package. The process continues for the next recovery group, even if there are problems with one recovery group. The command will skip recovery for any recovery group in maintenance mode. After processing one recovery group, if the command discovers that the local cluster is back up, the command exits, since the alarm or alert state no longer exists. This process keeps both the primary and recovery packages from running on the remote cluster and local cluster at the same time, which would result in data corruption. NOTE: If the remote cluster comes back up following a cluster event but the primary packages cannot run, halt the primary cluster with the cmhaltcl command, then issue cmrecovercl with the -f option. Forcing a Package to Start The cmforceconcl command is used to force a Continentalclusters package to start even if the status of a remote package in the recovery group is unknown. This command is used as a prefix to a cmrunpkg and cmmodpkg command. Under normal circumstances, Continentalclusters will not allow a package to start in the recovery cluster unless it can determine that the package is not running in the primary cluster. In some cases, communication between the two clusters may be lost, and it may be necessary to start the package on the recovery cluster anyway. To do this, use the cmforeconcl command, which is used along with a cmrunkpg or cmmodpkg command, as in the following example: 98 Designing a Continental Cluster # cmforceconcl cmrunpkg -n node3 Pkg1 CAUTION: When using this command, ensure that the other cluster is not running the package. Failure to do this may result in the package running in both clusters, which will cause data corruption. Restoring Disaster Tolerance After a failover to a cluster occurs, restoring disaster tolerance has many challenges, the most significant of which are: • Restoring the failed cluster. Depending on the nature of the disaster it may be necessary to either create a new cluster or to restore the cluster. Before starting up the new or the failed cluster, make sure the AUTO_RUN flag for all of the Continentalclusters application packages is disabled. This is to prevent starting the packages unexpectedly with the cluster. • Resynchronizing the data To resynchronize the data, you either restore the data to the cluster and continue with the same data replication procedure, or set up data replication to function in the other direction. The following sections briefly outline some scenarios for restoring disaster tolerance. Restore Clusters to their Original Roles If the disaster did not destroy the cluster, there is the option to return both clusters in a recovery pair to their original roles. To do this: 1. 2. Make sure that both clusters are up and running, with the recovery packages continuing to run on the surviving cluster. On each cluster, stop the Continentalclusters monitor package if it is still running. # cmhaltpkg ccmonpkg 3. 4. Compare the clusters to make sure their configurations are consistent. Correct any inconsistencies. For each recovery group where the repaired cluster will run the primary package: a. Synchronize the data from the disks on the surviving cluster to the disks on the repaired cluster. This may be time-consuming. b. Halt the recovered application on the surviving cluster if necessary, and start it on the repaired cluster. c. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group. Restoring Disaster Tolerance 99 5. Restart the monitor using the following command on each cluster: # cmrunpkg ccmonpkg Alternatively, if the monitoring package configuration has been modified, use the following sequence on each cluster to apply the new configuration and start the monitor: # cmapplyconf -P ccmonpkg.config # cmmodpkg -e ccmonpkg 6. View the status of the Continentalcluster. # cmviewconcl Primary Packages Remaining on the Surviving Cluster Configure the failed cluster in a recovery pair as a recovery-only cluster and the surviving cluster as a primary-only cluster. This minimizes the downtime involved with moving the applications back to the restored cluster. It also assumes that the surviving cluster has sufficient resources to handle running all critical applications indefinitely. NOTE: In a multiple recovery pairs scenario, where more than one primary cluster are configured to share the same recovery cluster, the following procedure to switch the role of the failed cluster and the surviving cluster should not be used. Use the following: 1. Halt the monitor packages. Issue the following command on each cluster: # cmhaltpkg ccmonpkg 2. 3. Edit the Continentalclusters ASCII configuration file. It is necessary to change the definitions of monitoring clusters, and switch the names of primary and recovery packages in the definitions of recovery groups. It may also be necessary to re-create data sender and data receiver packages. Check and apply the Continentalclusters configuration. # cmcheckconcl -v -C cmconcl.config # cmapplyconcl -v -C cmconcl.config 4. Restart the monitor packages on each cluster. # cmmodpkg -e ccmonpkg 5. View the status of the Continentalcluster. # cmmviewconcl Before applying the edited configuration, the data storage associated with each cluster needs to be prepared to match the new role. In addition, the data replication direction 100 Designing a Continental Cluster needs to be changed to mirror data from the new primary cluster to the new recovery cluster. Primary Packages Remaining on the Surviving Cluster using cmswitchconcl Continentalclusters provides the command cmswitchconcl to facilitate steps two and three described in the section “Primary Packages Remaining on the Surviving Cluster”. The command cmswitchconcl is used to switch the roles of primary and recovery packages of the Continentalclusters recovery groups for which the specified cluster is defined as the primary cluster. Do not use the cmswitchconcl command in a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster. Otherwise, the command will fail. When switching roles for a recovery group configured with a rehearsal package, the rehearsal package in the old recovery cluster should be removed before the configuration is applied. The newly generated recovery group configuration will not have any rehearsal package configured. WARNING! When you configure the maintenance mode for a recovery group, you must move all recovery groups, whose roles have been switched, out of the maintenance mode, before applying the new configuration. NOTE: Before running the cmswitchconcl command, the data storage associated with each cluster needs to be prepared properly to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster. The cmswitchconcl command cannot be used for the recovery groups that have both data sender and data receiver packages specified. To restore disaster tolerance with cmswitchconcl while continuing to run the packages on the surviving cluster, use the following procedures: 1. Halt the monitor package on each cluster. # cmhaltpkg ccmonpkg 2. Run this command. # cmswitchconcl \ -C currentContinentalclustersConfigFileName \ -c oldPrimaryClusterName \ [-a] [-F NewContinentalclustersConfigFileName] The above command switches the roles of the primary and recovery packages of the Continentalclusters recovery groups for which “OldPrimaryClusterName” is defined as the primary cluster. Restoring Disaster Tolerance 101 The default values of monitoring package name (ccmonpkg) and interval (60 seconds), and notification scheme (SYSLOG) with notification delay (0 seconds) will be added for cluster “OldPrimaryClusterName”, which will serve as the recover-only cluster. If editing of the default values are desired, do it with file “NewContinentalclusterConfigFileName” if -F is specified, or with file, “CurrentContinentalclustersConfigFileName” if -F is not specified. If editing of the new configuration file is needed, do not use the -a option. If option -a is specified the new configuration will be applied automatically. 3. If option -a is specified with cmswitchconcl in step 2, skip this step. Otherwise manually apply the new Continentalclusters configuration. # cmapplyconcl -v -c newContinentalclustersConfigFileName (if -F is specified in step 2) # cmapplyconcl -v -c \ CurrentContinentalcusterConfigFileName (if -F is not specified in step 2) 4. Restart the monitor packages on each cluster. # cmmodpkg -e ccmonpkg 5. View the status of the Continentalcluster. # cmviewconcl NOTE: The cluster shared storage configuration file /etc/cmconcl/ccrac/ ccrac.config is not updated by cmswitchconcl. The CCRAC_CLUSTER and CCRAC_INSTANCE_PKGS variables in the cluster shared storage configuration file must be manually updated on all nodes in the clusters to reflect the new primary cluster and package names. The cmswitchconcl command is also used to switch the package role of a recovery group. If only a subset of the primary packages will remain running on the surviving (recovery) cluster, a new option -g is provided with the cmswitchconclcommand. This option reconfigures the roles of the packages of a recovery group and helps retain recovery protection after a failover. Usage of option -g(recovery group based role switch reconfiguration) is the same as the one for -c(cluster based role switch reconfiguration). Note, option -c and -g of the cmswitchconcl command are mutually exclusive. # cmswitchconcl \ -C currentContinentalclustersConfigFileName \ -g RecoverGroupName \ [-a] [-F NewContinentalclustersConfigFileName] 102 Designing a Continental Cluster The following is a sample of input and output files for running cmswitchconcl -C sample.input -c clusterA -F Sample.out sample.input ============ ### Section 1. Cluster Information CONTINENTAL_CLUSTER_NAME Sample_CC_Cluster CLUSTER_NAME ClusterA CLUSTER_DOMAIN cup.hp.com NODE_NAME node1 NODE_NAME node2 MONITOR_PACKAGE_NAME ccmonpkg CLUSTER_NAME ClusterB CLUSTER_DOMAIN cup.hp.com NODE_NAME node3 NODE_NAME node4 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS ### Section 2. Recovery Groups RECOVERY_GROUP_NAME RG1 RIMARY_PACKAGE ClusterA/pkgX ECOVERY_PACKAGE ClusterB/pkgX' RECOVERY_GROUP_NAME RG2 PRIMARY_PACKAGE ClusterA/pkgY RECOVERY_PACKAGE ClusterB/pkgY' DATA_RECEIVER_PACKAGE ClusterB/pkgR1 RECOVERY_GROUP_NAME RG3 PRIMARY_PACKAGE ClusterB/pkgZ RECOVERY_PACKAGE ClusterA/pkgZ' RECOVERY_GROUP_NAME RG4 PRIMARY_PACKAGE ClusterB/pkgW RECOVERY_PACKAGE ClusterA/pkgW' DATA_RECEIVER_PACKAGE ClusterA/pkgR2 ### Section 3. Monitoring Definitions CLUSTER_EVENT ClusterA/DOWN MONITORING_CLUSTER ClusterB CLUSTER_ALERT 60 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/data/events.log NOTIFICATION SYSLOG “CC alert: DOWN” CLUSTER_ALARM 90 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/data/events.log NOTIFICATION SYSLOG “CC alarm: DOWN” “CC alert: DOWN” “CC alarm: DOWN” Sample output ### Section1. Cluster Information CONTINENTAL_CLUSTER_NAME Sample_CC_Cluster CLUSTER_NAME ClusterA CLUSTER_DOMAIN cup.hp.com NODE_NAME node1 NODE_NAME node2 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS CLUSTER_NAME ClusterB CLUSTER_DOMAIN cup.hp.com Restoring Disaster Tolerance 103 NODE_NAME NODE_NAME node3 node4 ### Section 2. Recovery Groups RECOVERY_GROUP_NAME RG1 PRIMARY_PACKAGE ClusterB/pkgX' RECOVERY_PACKAGE ClusterA/pkgX RECOVERY_GROUP_NAME RG2 PRIMARY_PACKAGE ClusterB/pkgY' RECOVERY_PACKAGE ClusterA/pkgY DATA_RECEIVER_PACKAGE ClusterA/pkgR1 RECOVERY_GROUP_NAME RG3 PRIMARY_PACKAGE ClusterB/pkgZ RECOVERY_PACKAGE ClusterA/pkgZ' RECOVERY_GROUP_NAME RG4 PRIMARY_PACKAGE ClusterB/pkgW RECOVERY_PACKAGE ClusterA/pkgW' DATA_RECEIVER_PACKAGE ClusterA/pkgR2 ### Section 3. Monitoring Definitions CLUSTER_EVENT ClusterB/DOWN MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: DOWN” CLUSTER_ALARM 0 MINUTES NOTIFICATION SYSLOG “CC alarm: DOWN”CLUSTER_EVENT ClusterB/UNREACHABLE MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: UNREACHABLE” CLUSTER_ALARM 0 MINUTES NOTIFICATION SYSLOG “ CC alarm: UNREACHABLE”CLUSTER_EVENT ClusterB/ERROR MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: ERROR”CLUSTER_EVENT ClusterB/UP MONITORING_CLUSTER ClusterA CLUSTER_ALERT 0 MINUTES NOTIFICATION SYSLOG “CC alert: UP” Newly Created Cluster Will Run Primary Packages After creating a new cluster to replace the damaged cluster, restore the critical applications to the new cluster and restore the other cluster to its role as a backup for the recovered packages. 1. 2. Configure the new cluster as a Serviceguard cluster. Use the cmviewcl command on the surviving cluster and compare the results to the new cluster configuration. Correct any inconsistencies on the new cluster. Halt the monitor package on the surviving recovery cluster. # cmhaltpkg ccmonpkg 104 Designing a Continental Cluster 3. Edit the continental cluster configuration file to replace the data from the old failed cluster with data from the new cluster. Check and apply the Continentalclusters configuration. # cmcheckconcl -v -C cmconcl.config # cmapplyconcl -v -C cmconcl.config 4. Do the following for each recovery group where the new cluster will run the primary package: a. Synchronize the data from the disks on the surviving recovery cluster to the disks on the new cluster. This may be time-consuming. b. Halt the application on the surviving recovery cluster if necessary, and start it on the new cluster. c. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group. 5. If the new cluster acts as a recovery cluster for any recovery group, create a monitor package for the new cluster. Apply the configuration of the new monitor package. # cmapplyconf -p ccmonpkg.config 6. Restart the monitor package on the surviving cluster. # cmrunpkg ccmonpkg 7. View the status of the Continentalcluster. # cmviewconcl Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups After replacing the failed cluster, if the downtime involved in moving the applications back is a concern, then do the following: • Change the surviving cluster to the role of primary cluster for all recovery groups. • Configure the new cluster as a recovery cluster for all those groups Configure the new cluster as a standard Serviceguard cluster, and follow the usual procedure to configure the continental cluster with the new cluster used as a recovery cluster for all recovery groups. Restoring Disaster Tolerance 105 NOTE: In a multiple recovery pairs scenario, (where more than one primary cluster is configured to share the same recovery cluster), reconfiguration of the recovery cluster should not be done due to the failure of one of the primary clusters. Performing a Rehearsal Operation in your Environment Use the cmrecovercl -r -g command to start the disaster recovery rehearsal process in your environment. This command checks for the following prerequisites before starting the rehearsal process: • • The recovery group is in the maintenance mode. The data receiver package, if configured in the recovery group, is halted and disabled in the recovery cluster. The rehearsal package runs regardless of the state of the primary cluster. When the rehearsal is in progress, any attempt to start the recovery package is prevented as the recovery group is in the maintenance mode. This prevents the recovery and the rehearsal packages from running at the same time on the recovery cluster. Following is an example of running the cmrecovercl -r command to rehearse the recovery group oracle_rac1 on a cluster called secondary cluster. atlanta:/opt/cmconcl/admin/instances>cmrecovercl -r -g oracle_rac1 Warning: For this recovery group ensure that the replication environment has been prepared for rehearsal. Before proceeding further, verify that a business copy has been prepared at the recovery cluster. This command does not verify that a business copy has been prepared.Do you want to proceed with rehearsing the recovery group? [y/n]? ycmrecovercl: Attempting to rehearse Recovery Group oracle_rac1 on cluster secondary_cluster.Note: The configuration file /etc/cmconcl/ccrac/ccrac.config for cluster shared storage exists. If the primary package in the target group is configured within this file,the replication environment preparation will be verified before starting the rehearsal package. If you choose "n" make sure that the required storage for the rehearsal package has been properly prepared and that the replication environment has being prepared.Is this what you intended to do? [y/n]? yEnabling rehearsal package racp-cfs-rehearsal on recovery cluster secondary_cluster Running package racp-cfs-rehearsal on node atlanta Successfully started package racp-cfs-rehearsal on node atlanta Running package racp-cfs-rehearsal on node miami Successfully started package racp-cfs-rehearsal on node miami Successfully started package racp-cfs-rehearsal. 106 Designing a Continental Cluster cmrecovercl -r Completed rehearsal process for each recovery group. Rehearsal packages have been started. Use cmviewcl or check package log file to verify that the rehearsal packages are successfully started. Warning: Once the rehearsal is complete and the rehearsal package is halted ensure that replication environment is restored for recovery and move recovery group out of Maintenance Mode. During rehearsal, if a primary site failure occurs, Continentalclusters detects it and you need to complete a recovery process. You need to restore the environment for recovery and complete the recovery processes. In case the recovery group data cannot be synchronized with the latest data from the primary cluster, you can use the business copy (BC/BCV) prepared during the preparation phase. However, this results in a delta data loss corresponding to the time the rehearsal was started. For more information on performing disaster recovery (DR) rehearsal for different types of applications and replication in a Continentalclusters environment, see Appendix G. This appendix describes how to set up and run data replication (DR) rehearsal with the example of a single instance Oracle application with Continentalclusters with Continuous Access XP integration. For additional examples of setting up and running DR rehearsal in different environments, see the Disaster Recovery Rehearsal in Continentalclusters whitepaper available at: http://docs.hp.com. Maintaining a Continental Cluster The following common maintenance tasks are described in this section: • • • • • • • • • Adding a Node to a Cluster or Removing a Node from a Cluster Adding a Package to a Continental Cluster Removing a Rehearsal Package from a Recovery Group Modifying a Recovery Group with a new Rehearsal Package Removing a Package from the Continental Cluster Changing Monitoring Definitions Checking the Status of Clusters, Nodes and Packages Reviewing Log Files Renaming a Continental Cluster Maintaining a Continental Cluster 107 • • Deleting a Continental Cluster configuration Checking Java Versions CAUTION: Never issue the cmrunpkg command for a recovery package when Continentalclusters is enabled, because there is no guaranteed way of preventing a package that is running on the one cluster from running on the other cluster if the package is started using this command. The potential for data corruption is significant. Adding a Node to a Cluster or Removing a Node from a Cluster To add a node to or remove a node from the continental cluster, use the following procedure: 1. Halt any monitor packages that are running both clusters. # cmhaltpkg ccmonpkg 2. Add or remove the node in a cluster by editing the Serviceguard cluster configuration file and applying the configuration. # cmapplyconf -C cluster.config 3. 4. 5. 6. 7. Edit the Continentalclusters configuration ASCII file to add or remove the node in the cluster. For added nodes, ensure that the /etc/cmcluster/cmclnodelist and /etc/ opt/cmom/cmomhosts files are set up correctly on the new node. Refer to “Preparing Security Files” (page 65). Ensure that the cmclnodelist and cmomhosts files on all nodes (including the new node) contains an entry allowing write access by the host on which you are running the configuration commands. Check and apply the configuration using the cmcheckconcl and cmapplyconcl commands. Restart the monitor packages on both clusters. View the status of the continental cluster. # cmviewconcl Adding a Package to the Continental Cluster To add a new package for possible recovery to the Continentalclusters configuration, it is necessary to first configure a new primary package and recovery package, then you must add a new recovery group to the Continentalclusters configuration file. In addition, it is necessary to ensure that the data replication is provided for the new package, either through hardware or software. Adding a new package does not require bringing down either cluster. However, in order to implement the new configuration, the following are required: 108 Designing a Continental Cluster 1. 2. 3. 4. 5. 6. 7. 8. 9. Configure the new primary and recovery packages by editing the new package configuration files and control scripts. Use the Serviceguard cmapplyconf command to add the primary package to one cluster, and the recovery package to the other cluster. Provide the appropriate data replication for the new package. Create the new recovery group in the Continentalclusters configuration file. Ensure that the cmclnodelist and cmomhosts files on all nodes contains an entry allowing write access by the host on which you are running the configuration commands. Halt the monitor packages on both clusters. Use the cmapplyconcl command to apply the new Continentalclusters configuration. Restart the monitor packages on both clusters. View the status of the continental cluster. # cmviewconcl Removing a Rehearsal Package from a Recovery Group To remove a rehearsal package from a recovery group, you must move the recovery group out of the maintenance mode and then delete the rehearsal package from the recovery cluster. Also, you need to update the Continentalclusters configuration file by removing the REHEARSAL_PACKAGE parameter in the recovery group definition. Distribute the Continentalclusters configuration by reapplying the configuration file. Modifying a Recovery Group with a new Rehearsal Package To change the rehearsal package configured for a recovery group, you need to first move the recovery group out of the maintenance mode. Then the old rehearsal package must be deleted from the recovery cluster and the new rehearsal package must be configured in the recovery cluster. Update the Continentalclusters configuration file by specifying the new rehearsal package name for the REHEARSAL_PACKAGE parameter in the recovery group definition. Distribute the Continentalclusters configuration by reapplying the configuration file. Removing a Package from the Continental Cluster To remove a package from the Continentalclusters configuration, you must first remove the recovery group from the Continentalclusters configuration file. Removing the package does not require you to bring down either cluster. However, in order to implement the new configuration, the following steps are required: 1. 2. Edit the continental clusters configuration file, deleting the recovery group. Halt the monitor packages that are running on the clusters. Maintaining a Continental Cluster 109 3. 4. 5. 6. Use the cmapplyconcl command to apply the new Continentalclusters configuration. Restart the monitor packages on both clusters. Use the Serviceguard cmdeleteconf command to remove each package in the recovery group. View the status of the continental cluster. # cmviewconcl Changing Monitoring Definitions It is allowable to change the monitoring definitions in the configuration without bringing down either cluster. This includes: adding, removing, or changing the cluster events, changing the timings, and adding, removing, or changing the notification messages. Use the following steps to change the monitoring definitions: 1. 2. 3. 4. 5. Edit the continental clusters configuration file to incorporate the new or changed monitoring definitions. Halt the monitor packages on both clusters. Use the cmapplyconcl command to apply the new configuration. Restart the monitor packages on both clusters. View the status of the continental cluster. # cmviewconcl Checking the Status of Clusters, Nodes, and Packages To check on the status of the continental clusters and associated packages, use the cmviewconcl command, which lists the status of the clusters, associated package status, and configured events status. This command also displays the mode of the recovery group, if configured. The following is an example of cmviewconcl output in a situation where there is a single recovery group for which the primary cluster is cjc838 and the recovery cluster is cjc1234. # cmviewconcl WARNING: Primary cluster cjc838 is in an alarm state (cmrecovercl is enabled on recovery cluster cjc1234) CONTINENTAL CLUSTER cjccc1 RECOVERY CLUSTER cjc1234 PRIMARY CLUSTER STATUS EVENT LEVEL cjc838 down ALARM PACKAGE RECOVERY GROUP 110 Designing a Continental Cluster prg1 POLLING INTERVAL 20 MAINTENANCE MODE PACKAGE ROLE cjc1234/recovery cjc1234/rehearsal NO STATUS cjc838/primary primary down recovery up rehearsal down The following is an example of cmviewconcl output from a primary cluster that is down. persian (root 2131): cmviewconcl -v WARNING: Primary cluster cjc838 is in an alarm state (cmrecovercl is enabled on recovery cluster cjc1234) Primary cluster cjc838 is not configured to monitor recovery cluster cjc1234 CONTINENTAL CLUSTER cjccc1 RECOVERY CLUSTER cjc1234 PRIMARY CLUSTER cjc838 CONFIGURED EVENT alert alarm alarm alert alert alert STATUS down EVENT LEVEL ALARM STATUS unreachable unreachable down error up up DURATION 15 sec 30 sec 0 sec 0 sec 20 sec 40 sec POLLING INTERVAL 20 LAST NOTIFICATION SENT --Fri May 12 12:13:06 PDT 2000 ---- PACKAGE RECOVERY GROUP prg1 MAINTENANCE MODE NO PACKAGE ROLE STATUS cjc838/primary primary down cjc1234/recovery recovery up cjc1234/rehearsal rehearsal down The following is the output of a cmviewconcl command that displays data for a mutual recovery configuration in which each cluster has both the primary and the recovery roles—the primary role for one recovery group and the recovery role for the other recovery group: CONTINENTAL CLUSTER RECOVERY CLUSTER PRIMARY CLUSTER PTST_sanfran ccluster1 PTST_dts1 STATUS Unmonitored EVENT LEVEL POLLING INTERVAL unmonitored 1 min CONFIGURED EVENT STATUS DURATION alert unreachable 1 min alert unreachable 2 min alarm unreachable 3 min alert down 1 min LAST NOTIFICATION SENT ----Maintaining a Continental Cluster 111 alert alarm alert alert down down error up 2 3 0 1 min min sec min RECOVERY CLUSTER PRIMARY CLUSTER PTST_dts1 PTST_sanfran STATUS Unmonitored CONFIGURED EVENT alert alert alarm alert alert alarm alert alert STATUS DURATION unreachable 1 min unreachable 2 min unreachable 3 min down 1 min down 2 min down 3 min error 0 sec up 1 min ----- EVENT LEVEL unmonitored LAST NOTIFICATION SENT --------- PACKAGE RECOVERY GROUP hpgroup10 PACKAGE ROLE PTST_sanfran/PACKAGE1 primary TST_dts1/PACKAGE1 recovery PACKAGE RECOVERY GROUP hpgroup20 PACKAGE PTST_dts1/PACKAGE1x_ld PTST_sanfran/PACKAGE1x_ld POLLING INTERVAL 1 min ROLE primary recovery STATUS down down STATUS down down For a more comprehensive status of component clusters, nodes, and packages, use the cmviewclcommand on both clusters. On each cluster, make note of which nodes the primary packages are running on, as well as data sender and data receiver packages, if they are being used for logical data replication. Verify that the monitor is running on each cluster on which it is configured. The following is an example of cmviewcl output for a cluster (nycluster) running a monitor package. Note that the recovery package salespkg_bak is not running, and is shown as an unowned package. This is the expected display while the other cluster is running salespkg. CLUSTER nycluster STATUS up NODE nynode1 STATUS up Network Parameters: INTERFACE STATUS PRIMARY up PRIMARY up 112 Designing a Continental Cluster STATE running PATH NAME 12.1 56.1 lan0 lan2 NODE nynode2 STATUS up STATE: running Network Parameters: INTERFACE STATUS PRIMARY up PRIMARY up PACKAGE ccmonpkg STATUS up PATH STATE NAME 4.1 56.1 lan0 lan1 PKG_SWITCH NODE running enabled Script_Parameters: ITEM NAME STATUS Service ccmonpkg.srv up MAX_RESTARTS 20 RESTARTS 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING NAME Primary up enabled Alternate up enabled UNOWNED Packages: PACKAGE STATUS salespkg_bak down STATE unowned nynode2 PKG_SWITCH nynode2 nynode1 (current) NODE Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover unknown Failback unknown Script_Parameters: ITEM STATUS Subnet unknown Subnet unknown NODE_NAME nynode1 nynode2 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary down Alternate down NAME 195.14.171.0 195.14.171.0 NAME nynode1 nynode2 Use the ps command to check for the status of the Continentalclusters monitor daemons cmclrmond and cmclsentryd, which should be running on the cluster node where the monitor package is running. Reviewing Messages and Log Files The Continentalclusters commands—cmquerycl, cmcheckconcl, cmapplyconcl, and cmrecovercl—all display messages on the standard output, which is the first place to look for error messages. All notification messages associated with cluster events are reported in /var/opt/ resmon/log/cc/eventlog on the cluster where monitoring is taking place. An example of output from this file follows: Maintaining a Continental Cluster 113 >-----Event Monitoring Service Event Notification ------------< Notification Time: Wed Nov 10 21:00:39 1999 system1 sent Event Monitor notification information: /cluster/concl/ccluster1/clusters/LAclust/status/unreachable is = 15 User Comments: Cluster "LAclust" has status "unreachable" for 15 sec >-----End Event Monitoring Service Event Notification ----------< In addition, if you have defined a TEXTLOG destination, notification messages are sent to the file that were specified. (See “Editing Section 3—Monitoring Definitions” (page 79) for more information.) Also review the monitor startup and shutdown log file /etc/cmcluster/ccmonpkg/ ccmonpkg.cntl.log on any node where a Continentalclusters monitor has been running. Information about the primary or recovery packages may be found in their respective startup and shutdown log files. Messages from the Continentalclusters daemon are reported in log file /var/adm/ cmconcl/sentryd.log, and Object Manager messages appear in /var/opt/cmom/ cmomd.log. These messages may be helpful in troubleshooting. Use the cmreadlog command to view the entries in these files. Examples: # /opt/cmom/tools/bin/cmreadlog -f /var/adm/cmconcl/sentryd.log slog.txt # /opt/cmom/tools/bin/cmreadlog -f /var/opt/cmom/cmomd.log \ omlog.txt The following is sample output from the cmreadlog command for the sentryd.log file:Oct 20 18:28:22:[[main,5,main]]:FATAL:dr.sentryd:No continental cluster found on this node.Oct 22 13:38:45:[[Thread-309,5,main]]:ERROR:dr.sentryd:Error connecting to axe28Oct 22 13:38:45:[[Thread-309,5,main]]:ERROR:dr.sentryd:Connection refusedOct 22 13:38:45:[[Thread-309,5,main]]:INFO:dr.sentryd:Connection failed to axe28Oct 22 13:38:45:[[Thread-311,5,main]]:ERROR:dr.sentryd:Cannot find cluster KC-cluster at location axe29Oct 22 13:38:45:[[Thread-311,5,main]]:ERROR:dr.sentryd:null result from query General information about Serviceguard operation is found in /var/adm/syslog/ syslog.log. Deleting a Continental Cluster Configuration The cmdeleteconcl command is used to delete the configuration on all nodes in the continental cluster configuration. To delete a continental cluster and the Continentalclusters configuration. 114 Designing a Continental Cluster # cmdeleteconcl NOTE: If modifying the configuration, re-issue the cmapplyconcl command. There is no need to delete the previous configuration. While deleting a Continentalcluster configured with the recovery group maintenance feature, the shared disk is not removed. Before applying a fresh Continentalclusters configuration using an old shared disk, you must re-initialize the file system in the shared disk using the mkfs command. Renaming a Continental Cluster To rename an existing continental cluster, perform the following steps: 1. Remove the continental clusters configuration. # cmdeleteconcl 2. Edit the CONTINENTAL_CLUSTER_NAME field in the configuration ASCII file, and run the cmapplyconcl command to configure the continental cluster with a new name. Checking Java File Versions Some components of Continentalclusters are executed from Java .jar files. To obtain version information about these files, use the what.sh script provided in the /opt/ cmconcl/jar directory. Example: # /opt/cmconcl/jar/what.sh configcl.jar Next Steps To implement the continental cluster design using physical data replication, use the procedures in the following sections below: • • • Chapter 3: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP” (page 133) Chapter 4: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA” (page 185) Chapter 5: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF” (page 227) Support for Oracle RAC Instances in a Continentalclusters Environment Support for Oracle RAC instances means that the RAC instances running on the primary cluster will be restarted by Continentalclusters on the recovery cluster to continue serving the clients' databases requests upon a primary cluster failure. Figure 2-11 is a sample of Oracle RAC instances running in the Continentalclusters environment. Support for Oracle RAC Instances in a Continentalclusters Environment 115 Figure 2-11 Oracle RAC Instances in a Continentalclusters Environment New York Secondary Serviceguard Cluster NYnode2 NYnode1 Highly Available Network Los Angeles Primary Serviceguard Cluster Running Oracle RAC xp LAnode1 Disk RAC Instance1 Array XP Disk Array WAN LAnode2 RAC Instance2 Continuous Access XP or Continuous Access EVA or EMC SRDF Data Replication As shown in the above example, Oracle RAC instances are configured to run in Serviceguard packages. The instance packages are running on the primary cluster and will be recovered on the recovery cluster upon a primary cluster failure. Figure 2-12 shows a recovery using an Oracle RAC configuration after failover. Oracle RAC instances are only supported in the Continentalclusters environment for physical replication using HP StorageWorks Continuous Access XP, or EMC Symmetrix Remote Data Facility (SRDF) using HP SLVM or Serviceguard Storage Management Suite using CFS for volume management. Continentalclusters support for Oracle instances using HP StorageWorks Continuous Access EVA is supported only with SLVM software. Continentalclusters Oracle RAC support is available for a cluster environment configured with only Serviceguard (for example, the environment running with Oracle 9i), or a cluster environment configured with Serviceguard plus Oracle Clusterware (for example, the environment running with Oracle 10g). Starting with Continentalclusters version A.05.01, recovery of an Oracle RAC instance in a cluster environment running Serviceguard and Oracle Clusterware is supported. There is a special configuration required for the environment running both Oracle Clusterware and Serviceguard/Serviceguard Extension for RAC (SGeRAC) for the Continentalclusters RAC instance recovery protection. For more information refer to the following section, “Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration” (page 126). 116 Designing a Continental Cluster Figure 2-12 Sample Oracle RAC Instances in a Continentalclusters Environment After Failover New York Secondary Serviceguard Cluster Running Oracle RAC NYnode1 RAC Instance1 Highly Available Network Los Angeles Primary Serviceguard Cluster NYnode2 XP Disk Array RAC Instance2 WAN X LAnode1 LAnode2 Disk Array Continuous Access XP or Continuous Access EVA or EMC SRDF Data Replication Configuring the Environment for Continentalclusters to Support Oracle RAC In order to enable Continentalclusters support for Oracle RAC, there needs to be a set of configurations, which include either Continuous Access XP, or Continuous Access EVA, or EMC SRDF, Oracle RAC, and Continentalclusters. To support this feature, Continentalclusters must be configured with an environment that has physical replication set up using HP StorageWorks Continuous Access XP, HP StorageWorks Continuous Access EVA or EMC Symmetrix Remote Data Facility (SRDF) using SLVM or Cluster Volume Manager (CVM) or Cluster File System (CFS) for volume management. For more information on specific Oracle RAC configurations that are supported, refer Table 2-8. For complete installation and configuration information of Oracle and HP StorageWorks products, refer to the Oracle RAC and HP StorageWorks manuals. Table 2–8 describes configuration information for RAC support of Continentalclusters Support for Oracle RAC Instances in a Continentalclusters Environment 117 Table 2-8 Supported Continentalclusters and RAC Configuration Oracle RAC Disk Arrays Oracle RAC with/without Clusterware Volume Managers Cluster File System Required Metrocluster HP StorageWorks HP SLVM XP Series with Serviceguard Continuous Access Storage Management CVM Serviceguard Storage Management Suite CFS Metrocluster with Continuous Access with XP HP StorageWorks HP SLVM EVA series with Serviceguard Continuous Access Storage Management CVM Serviceguard Storage Management Suite CFS Metrocluster with Continuous Access with EVA EMC Symmetrix series with SRDF Serviceguard Storage Management Suite CFS Metrocluster with EMC SRDF version HP SLVM Serviceguard Storage Management CVM Use the following set of procedures to enable Continentalclusters recovery support for Oracle RAC instances: 1. 2. Configure either Continuous Access XP, or Continuous Access EVA or EMC SRDF for data replication between disk arrays associated with primary and recovery clusters. For more details, see Chapter 3: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP”, Chapter 4: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA”, or Chapter 5: “Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF”. Configure the database storage using one of the following software: • Shared Logical Volume Manager (SLVM) • Cluster Volume Manager (CVM) • Cluster File Systems (CFS) You need to configure the SLVM volume groups or CVM disk groups on the disk arrays to store the Oracle database. Configure the volume groups or disk groups on both primary and recovery clusters. Ensure that the volume groups names or disk group names on both clusters are identical. You must also setup data replication between the disk arrays associated with primary and recovery clusters. Only the volume groups or disk groups configured to store the database must be configured for replication across primary and recovery clusters. In the environment running with Oracle Clusterware, you must configure the storage used by Oracle Clusterware to reside on disks that are not replicated. 118 Designing a Continental Cluster If you use CVM or CFS in your environment for storage infrastructure, you need to complete the following steps at both, primary and recovery clusters. a. Make sure that the primary and recovery clusters are running. b. Configure and start the CFS or CVM multi-node package using the command cfscluster config -s. When CVM starts, it automatically selects the master node. This master node is the node from which you must issue the disk group configuration commands. To determine the master node, run the following command from any node in the cluster. # vxdctl -c mode c. Create disk groups and mount points. For more information on creating disk groups and mount points, refer to Using Serviceguard Extension for RAC User’s Guide. NOTE: When you use CVM disk groups, Continentalclusters does not support configuring the CVM disk groups in the RAC instance package files using the CVM_ACTIVATION_CMD and CVM_DISK_GROUP variables. The instance packages should be configured to have a dependency with the required CVM disk group multi-node package. d. Run the following commands of the CFS scripts to add and configure the disk groups and file system mount points multi-node packages (MNP) to the clusters. These multi-node packages manipulate the disk group, and mount-point activities in the cluster. • cfsdgadm add all=SW For example: cfsdgadm add racdgl all=SW • cfsmntadm add / all=SW For example: cfsmntadm add racdgl vol4 /cfs/mntl all=SW e. Set the AUTO_RUN flag to NO with the following commands: • cfsdgadm set_autorun NO • cfsmntadm set_autorun < mount point name> NO f. Activate the disk group MNP using the following command: cfsdgadm activate g. Start the mount point MNP using the following command: cfsmount Support for Oracle RAC Instances in a Continentalclusters Environment 119 NOTE: After you configure the disk group and mount point multi-node packages, you must deactivate the packages on the recovery cluster. During a recovery process, the cmrecovercl command automatically activates these multi-node packages. h. Set the access rights for volumes and disk groups to persistent using the following command: vxedit -g set user= group= set mode= This step is required because when you import disks or volume groups to the recovery site, the access rights for the imported disks or volume groups are set to root by default. As a result, the database instances do not start. To eliminate this behavior, you must set the access rights to persistent. 3. Configure Oracle RAC. You need to configure all the database files to reside on SLVM volume groups, CVM disk groups or CFS file systems that you have configured in your environment. Ensure that the configuration of the Oracle RAC instances that must be recovered in the Continentalclusters environment are identical on the primary and recovery clusters. For more information on configuring Oracle RAC, refer to the Oracle RAC installation and configuration user’s guide. If you have Oracle Clusterware and Serviceguard running in your environment, you need to complete certain additional configuration procedures. For more information on these configuration procedures, see “Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration” (page 126). 4. 5. Configure Continentalclusters. For more information on configuring Continentalclusters, see “Building the Continentalclusters Configuration” (page 65). Configure Oracle RAC instances in Serviceguard packages. Continentalclusters supports recovery only for applications running in Serviceguard packages. In a multiple recovery pair scenario, where more than one primary cluster share the same recovery cluster, the primary RAC instance package name must be unique on each primary cluster. Configure the Oracle RAC instance packages on both primary and recovery clusters based on the number of RAC instances configured to run on that cluster. Ensure that the same number of Oracle RAC instances are configured on both the primary and recovery clusters. This ensures Continentalclusters recovery protection. Set the AUTO_RUN parameter in the package configuration file to NO. For details on how to configure an Oracle RAC instance in a Serviceguard package, refer to the Using Serviceguard Extension for RAC user’s guide. In the Continentalclusters environment, you can configure each RAC instance in a failover type package or you can configure all RAC instances in a single multi-node package. 120 Designing a Continental Cluster 6. Setup the environment file. Instead of one environment file for each continental cluster application package, there is only one environment file for each set of Oracle RAC instance packages accessing the same database. This file can be located anywhere except the directory where the Oracle RAC instance package configuration and control files reside. Only one environment file can reside under one directory. The setup of the file is the same as what is described in section, “Physical Data Replication using Special Environment files” (page 54) of this chapter, with the exception of the PKGDIRvariable. The value of the PKGDIR variable must be the directory where this environment file resides. For specific information on how to setup the environment file, see Chapter 3 under section, “Configuring Packages for Disaster Recovery” Chapter 4 under section, “Configuring Packages for Automatic Disaster Recovery” or Chapter 5 under section “Configuring Serviceguard Packages for Automatic Disaster Recovery”. Be sure to place this environment file in the same path on all nodes of both the primary and recovery clusters in a recovery pair. You must name the environment file using your package name as the prefix. For example, _xpca.env. You must uncomment all the AUTO variables in the environment file. Based on the disk arrays in your environment, refer to the corresponding chapters of this manual for more information on configuring the environment file for your storage. 7. Set up the Continentalclusters Oracle RAC specification file. The existence of file /etc/cmconcl/ccrac/ccrac.config servers as an enabler for Continentalclusters Oracle RAC support. A template of this file is available in /opt/cmconcl/scripts directory. Edit this file to suit your environment. After editing, move the file to /etc/ cmconcl/ccrac/ccrac.config directory on all nodes in the participating clusters. Use the following steps to set up the file: a. Login as root on one node of the primary cluster. b. Change to your own directory: # cd c. Copy the file: # cp /opt/cmconcl/scripts/ccrac.config \ ccrac.config.mycopy d. Edit the file ccrac.config.mycopy to fit your environment. The following parameters need to be edited: CCRAC_ENV - fully qualified Metrocluster environment file name. This file naming convention as required by the Metrocluster software. It has to be appended with Support for Oracle RAC Instances in a Continentalclusters Environment 121 _.env where is the name of the data replication scheme being used. Refer to Metrocluster documents for the environment file naming convention. This parameter is mandatory CCRAC_SLVM_VGS - SLVM volume groups configured for the device specified in the above environment file for variable DEVICE_GROUP. These are the volume groups used by the associated RAC instance packages. It is important that all of the volume groups configured for the specified DEVICE_GROUP are listed. If only partial of the configured volume groups are listed, the device will not be prepared properly and the storage will result in an inconsistent state. This parameter is mandatory when SLVM volume groups are used. This parameter should not be declared when only CVM disk groups are used. CCRAC_CVM_DGS - CVM disk groups configured for the device specified in the above environment file for variable DEVICE_GROUP. These are the disk groups used by the associated RAC instance packages. It is important that all of the disk groups configured for the specified DEVICE_GROUP are listed. If only partial of the configured disk groups are listed, the device will not be prepared properly and the storage will result in an inconsistent state. This parameter is mandatory when CVM disk groups or CFS are used. This parameter cannot be declared when SLVM volume groups are used. CCRAC_INSTANCE_PKGS - the names of the configured RAC instance packages accessing in parallel the database stored in the specified volume groups. This parameter is mandatory. CCRAC_CLUSTER - Serviceguard cluster name configured as the primary cluster of the corresponding RAC instance package set. This parameter is mandatory. CCRAC_ENV_LOG - logfile specification for the storage preparation output. 122 Designing a Continental Cluster This parameter is optional. If not specified, ${CCRAC_ENV}.log will be used.Sample setup: CCRAC_ENV[0]=/etc/cmconcl/ccrac/db1/db1EnvFile_xpca.env CCRAC_SLVM_VGS[0]=ccracvg1 ccracvg2 CCRAC_INSTANCE_PKGS[0]=ccracPkg1 ccracPkg2 CCRAC_CLUSTER[0]=PriCluster1 CCRAC_ENV_LOG[0]=/tmp/db1_prep.log (Multiple values for CCRAC_SLVM_VGS and CCRAC_INSTANCE_PKGS should be separated by space). If multiple sets of Oracle instances accessing different databases are configured in your environment and need Continentalclusters recovery support, repeat this set of parameters with an incremented index. For example, CCRAC_ENV[0]=/etc/cmconcl/ccrac/db1/db1EnvFile_xpca.env CCRAC_SLVM_VGS[0]=ccracvg1 ccracvg2CCRAC_INSTANCE_PKGS[0]=ccracPkg1 ccracPkg2CCRAC_CLUSTER[0]=PriCluster1 CCRAC_ENV_LOG[0]=/tmp/db1_prep.log CCRAC_ENV[1]=/etc/cmconcl/ccrac/db2/db2EnvFile_srdf.env CCRAC_CVM_DGS[1]=racdg01 racdg02 CCRAC_INSTANCE_PKGS[1]=ccracPkg3 ccrac Pkg4CCRAC_CLUSTER[1]=PriCluster2 CCRAC_ENV_LOG[1]=/tmp/db2_prep.log CCRAC_ENV[2]=/etc/cmconcl/ccrac/db3/db3EnvFile_xpca.env CCRAC_SLVM_VGS[2]=ccracvg5 ccracvg6 CCRAC_INSTANCE_PKGS[2]=ccracPkg5 ccracPkg6 CCRAC_CLUSTER[2]=PriCluster2 e. Copy the edited file to the final directory: # cp ccrac.config.mycopy \/etc/cmconcl/ccrac/ccrac.config f. Copy file /etc/cmconcl/ccrac/ccrac.config to all the other nodes of the cluster. g. Login as root on one node of the recovery cluster and repeat steps “b” through “f” from above. If the recovery cluster is configured to recover the Oracle RAC instances for more than one primary cluster, the ccrac.config file on the recovery cluster should contain information for all the primary clusters. 8. Configure Continentalclusters Recovery Group for Oracle RAC instance. If you are using an individual package for each RAC instance, define one recovery group for each Oracle RAC instance recovery. The PRIMARY_PACKAGE specified for the Oracle RAC instance recovery group is the name of the instance package configured on the primary cluster. The RECOVERY_PACKAGE specified for the RAC instance recovery group is the corresponding instance package name configured on the recovery cluster. For example: Support for Oracle RAC Instances in a Continentalclusters Environment 123 RECOVERY_GROUP_NAME instanceRG1 PRIMARY_PACKAGE ClusterA/instancepkg1 RECOVERY_PACKAGE ClusterB/instancepkg1' RECOVERY_GROUP_NAME instanceRG2 PRIMARY_PACKAGE ClusterA/instancepkg2 RECOVERY_PACKAGE ClusterB/instancepkg2' Packages instancepkg1 and instancepkg2 are configured to run on primary cluster “ClusterA”. Packages instancepkg1’ and instancepkg2’are configured to be restarted or recovered on the recovery cluster “ClusterB” upon primary cluster failure. If you are using one multi-node package to package all RAC instances, define only one recovery group for the RAC MNP Package. For example. RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE manufacturing_recovery ClusterA/man_rac_mnp ClusterB/man_rac_mnp When recovering a recovery group with multi-node packages, Continentalcluster will start an instance in each cluster node configured in the MNP. After editing the Continentalclusters configuration file to add in the recovery group specification for Oracle RAC instance packages, you must manually apply the new configuration by running the cmapplyconcl command. When you finish configuring a recovery pair with RAC support, your systems must have sets of files similar to those shown in Figure 2-13. 124 Designing a Continental Cluster NOTE: If you are configuring Oracle RAC instances in Serviceguard packages in a CFS or CVM environment, do not specify the CVM_DISK_GROUPS, and CVM_ACTIVATION_CMD fields in the package control scripts as CVM disk group manipulation is addressed by the disk group multi node package. Figure 2-13 Continentalclusters Configuration Files in a Recovery Pair with RAC Support New York Cluster NYnode1 recovery package files RACinstance1_bak.config RACinstance1_bak.cntl RACinstance2_bak.config contclust config file cmconcl.config contclust config file cmconcl.config contclust monitor pkg ccmonpkg.config ccmonpkg.cntl cont clust RAC spec. file /etc/cmconcl/ccrac/ccrac \ .config Los Angeles Cluster WAN primary package files RACinstance1.config RACinstance1.cntl RACinstance2.config RACinstance2.cntl contclust config file cmconcl.config contclust config file cmconcl.config contclust monitor pkg ccmonpkg.conf ccmonpkg.cntl storage env. file /etc/cmcluster/ccrac/db1\ contclust monitor pkg ccmonpkg.config ccmonpkg.cntl / contclust RAC spec. file /etc/cmconcl/ccrac/ccrac\ .;config storage env. file /etc/cmcluster/ccrac/db1\ /db1EnvFile_xpca.env /db1EnvFile_xpca.env managed object files /etc/cmconcl/instances/* / contclust monitor pkg ccmonpkg.config ccmonpkg.cntl cont.clust RAC spec. file /etc/cmconcl/ccrac/ccrac\ .config storage env. file /db1EnvFile_xpca.env /etc/cmcluster/ccrac/db1\ /db1EnvFile_xpca.env managed object files managed object files / etc/cmconcl/instances/* LAnode2 primary package files / contclust RAC spec. file /etc/cmconcl/ccrac/ccrac.\ .config RACinstance2_bak.cntl storage env. file /etc/cmcluster/ccrac/db1\ /etc/cmconcl/instances/* LAnode1 NYnode2 recovery package files managed object files /etc/cmconcl/instances/* / Support for Oracle RAC Instances in a Continentalclusters Environment 125 Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration The following are the required configurations for Continentalclusters RAC instance recovery support for the cluster environment running with Serviceguard/Serviceguard Extension for RAC and CRS (Oracle Cluster Software): 1. 2. 3. 4. The Oracle RAC environment running with Serviceguard/Serviceguard Extension for RAC and Oracle Cluster Software should follow all the recommendations listed in the Serviceguard and SGeRAC manuals for running with CRS (Oracle Cluster Software). CRS should not activate the volume groups configured for the database automatically at startup time. The file /var/opt/oracle/oravg.conf should not exist on any node of the primary and recovery cluster. The CRS storage (OCR and voting disk) should be configured on a separate volume group than the ones for the databases which are to be accessed by the RAC instances. The RAC instance attribute AUTO_START listed in the CRS service profile should be set to 2 on both primary and recovery clusters so that the instance will not be automatically started when the node rejoins the cluster. Login as the oracle administrator and use the following steps to change the attribute value: a. Generate the resource profile. crs_stat -p instance_name > $CRS_HOME/crs/public/instance_name.cap b. Edit the resource profile and set AUTO_START value to 2. c. Register the value. crs_register -u instance_name d. Verify the value. crs_stat -p instance_name Initial Startup of Oracle RAC Instance in a Continentalclusters Environment To ensure that the disk array will be ready for access in shared mode for the Oracle RAC instances, it is recommended that the user runs the Continentalclusters tool /opt/ cmconcl/bin/ccrac_mgmt.ksh to initially startup the configured instance packages. This tool ensures that the configured disk array will be ready in writable mode for shared access before starting up the RAC instance packages. If this tool is not used, manual checking is needed to make sure the storage is ready in writable and shared access mode before starting the RAC instance packages. 126 Designing a Continental Cluster NOTE: It is recommended that ccrac_mgmt.ksh is used for the initial startup of the RAC instance package, or for failing back the RAC instance packages. This tool should not be used at the recovery site for recovering RAC instance packages, instead cmrecovercl is used in this case. After the initial startup, use Serviceguard commands cmhaltpkg, cmrunpkg, cmmodpkg as needed to halt and restart the packages on the primary cluster. Use the following steps on any node of the primary cluster to do the initial startup of the Oracle RAC instance packages: 1. 2. 3. If the cluster is running with Serviceguard and Oracle CRS, make sure that the CRS daemons and the required Oracle services, such as listener, GSD, ONS, and VIP are up and running on all the nodes the RAC database instances are configured to run. Make sure /etc/cmconcl/ccrac/ccrac.config exists and was edited to contain the appropriate information. To start all the RAC instance packages configured to run as primary packages on the local cluster. # /opt/cmconcl/bin/ccrac_mgmt.ksh start To start a specific set of RAC instance packages. # /opt/cmconcl/bin/ccrac_mgmt.ksh -i start is the index used in the /etc/cmconcl/ccrac/ccrac.config file for the target set of the Oracle RAC instance packages. 4. To stop all the RAC instance packages configured to run as primary packages on the local cluster. # /opt/cmconcl/bin/ccrac_mgmt.ksh stop To stop a specific set of RAC instance packages. # /opt/cmconcl/ccrac_mgmt.ksh -i stop is the index used in the /etc/cmconcl/ccrac/ccrac.config file for the target set of the Oracle RAC instance packages. Failover of Oracle RAC Instances to the Recovery Site Upon a disaster that disables the primary cluster, to start up a Continentalclusters recovery process, run the following command: # cmrecovercl For the cluster environment running with Serviceguard and Oracle Clusterware, confirm that the Clusterware daemons and the required Oracle services, such as listener, GSD, ONS, and VIP, are started on all the nodes, which the database instance are configured to run before initiating the recovery process. Support for Oracle RAC Instances in a Continentalclusters Environment 127 If you have configured CFS or CVM in your environment, ensure the following: • The SG-CFS-PKG (system multi-node package) is up and running. The SG-CFS-PKG package is not part of the continentalclusters configuration. • The cmrecovercl command is run from the CVM master node. Use the following command to display the CVM master node: # vxdctl -c mode Starting with Continentalclusters A.07.00, recovery groups of applications using CFS or CVM can be recovered by running the cmrecovercl command from any node at the recovery cluster. NOTE: Make sure that the primary site is unavailable and all of the Oracle RAC instance packages are not running on the primary cluster before initiating the recovery process. The Continentalclusters command, cmrecovercl prepares the configured storage for Oracle RAC instances shared access only when the file /etc/cmconcl/ccrac/ ccrac.config exists. If this file does not exist, the configured storage will not be prepared for shared access before recovering the Oracle RAC instance packages. As a result, if Continentalclusters recovery group configuration includes Oracle RAC instance packages, these packages will not be able to start or operate successfully. The recovery process will startup the configured Oracle RAC instance packages as well as other application packages configured in the Continentalclusters environment. If the Continentalclusters Oracle RAC support is enabled (the /etc/cmconcl/ccrac/ ccrac.config file exists), the following messages will be prompted to the user when the command cmrecovercl is invoked and confirmations are needed for the process to proceed. WARNING: This command will take over for the primary cluster LACluster by starting the recovery package on the recovery cluster NYCluster. You must follow your site disaster recovery procedure to ensure that the primary packages on LACluster are not running and that recovery on NYCluster is necessary. Continuing with this command while the applications are running on the primary cluster may result in data corruption. Are you sure that the primary packages are not running and will not come back, and are you certain that you want to start the recovery packages [y/n]? y cmrecovercl: Attempting to recover Recovery Groups from cluster LACluster. NOTE: The configuration file /etc/cmconcl/ccrac/ccrac.config for cluster shared storage recovery exists. Data storage specified 128 Designing a Continental Cluster in the file for this cluster will be prepared for this recovery process. If you choose "n" - not to prepare the storage for this recovery process, make sure that the required storage for this recovery process has been properly prepared. Is this what you intend to do [y/n]? y The Oracle RAC instance package can be started in sequence. # cmrecovercl -g Option -g is used to start up the first instance package, wait until the disk arrays are synchronized before starting up the second instance package. If option -g is used with the command cmrecovercl, the following messages will be given instead: WARNING: This command will take over for the primary cluster primary_cluster by starting the recovery package on the recovery cluster secondary_cluster. You must follow your site disaster recovery procedure to ensure that the primary packages on primary_cluster are not running and that recovery on secondary_cluster is necessary. Continuing with this command while the applications are running on the primary cluster may result in data corruption. Are you sure that the primary packages are not running and will not come back, and are you certain that you want to start the recovery packages [y/n]? y cmrecovercl: Attempting to recover RecoveryGroup subsrecovery1 on cluster secondary_cluster NOTE: The configuration file /etc/cmconcl/ccrac/ccrac.config for cluster shared storage recovery exists. If the primary package in the target group is configured within this file, the corresponding data storage will be prepared before starting the recovery package. If you choose "n" - not to prepare the storage for this recovery process, make sure that the required storage for the recovery package has been properly prepared. Is this what you intend to do [y/n]? y Enabling recovery package racp-cfs on recovery cluster secondary_cluster Running package racp-cfs\ Running package racp-cfs on node atlanta Successfully started package racp-cfs on node atlanta Running package racp-cfs on node miami Successfully started package racp-cfs on node miami Support for Oracle RAC Instances in a Continentalclusters Environment 129 Successfully started package racp-cfs. cmrecovercl: Completed recovery process for each recovery group. Recovery packages have been started. Use cmviewcl or check package log file to verify that the recovery packages are successfully started. These message prompts can be disabled by running cmrecovercl with option -y. If you have configured the Oracle RAC instance package such that there is one instance for every package, the instance or recovery group can be recovered individually. If you have configured all instances as a single multi-node package (MNP), recovering the recovery group of this package starts all instances. NOTE: At the recovery time, Continentalclusters is responsible for recovering the Oracle RAC instance packages configured. The data integrity and currency at the recovery site are based on your data replication configuration in the Oracle environment. Failback of Oracle RAC Instances After a Failover After failover, the configured disk array at the old recovery cluster becomes the primary storage of the database. The Oracle RAC instances are running at the recovery cluster after a successful recovery. To failback the Oracle RAC instances to the primary cluster, follow the procedures listed below. Before failing back the Oracle RAC instances, make sure that the data in the original primary site disk array is in an appropriate state. Follow the disk array specific procedures for data resynchronization between two clusters, and the Oracle RAC failback procedures before restarting the instance. NOTE: Make sure the AUTO_RUN flag for all the configured Continentalclusters packages is disabled before restarting the cluster. 1. 2. Fix the problems that caused the primary site failure. Stop the Oracle RAC instance packages running on the recovery cluster. On any node of the recovery cluster. # /opt/cmconcl/bin/ccrac_mgmt.ksh stop If you have configured CVM or CFS in your environment, you need to complete the following procedure: 130 Designing a Continental Cluster a. Unmount the CFS mount points using the following command: cfsumount b. Deactivate the disk groups using the following command: cfsdgadm deactivate c. Deport the disk groups using the following command: vxdg deport The recovery cluster is now ready to failback packages and applications to the primary cluster. 3. 4. Synchronize the data between the two participating clusters. Make sure that the data integrity and the data currency are at the expected level at the primary site. Verify that the primary cluster is up and running. # cmviewcl 5. If the cluster is running with Serviceguard and Oracle CRS, make sure that CRS and the required services, such as listener, GSD, ONS, and, VIP are up and running on all of the instance nodes. By default, when CRS is started, these Oracle services are initiated. NOTE: Ensure that the SG-CFS-PKG (system multi-node) package is running for the CFS/CVM environment. 6. Startup the Oracle RAC instance packages on the primary cluster. If you have configured CFS or CVM in your environment, issue the following command from the master node: # /opt/cmconcl/bin/ccrac_mgmt.ksh start Alternatively, you can run the command on any node in the primary cluster. This command fails back all of the RAC instance packages configured to adopt to this cluster as the primary cluster. To failback only a specific set of the Oracle RAC instance package set. # /opt/cmconcl/bin/ccrac_mgmt.ksh [-i ] \ start is the index used in the/etc/cmconcl/ccrac/ccrac.config file for the target set of the Oracle RAC instance packages. Rehearsing Oracle RAC Databases in Continentalclusters Special precaution is required for running disaster recovery (DR) rehearsal for Oracle RAC databases. For information on configuring and running rehearsal for RAC databases, see Disaster Recovery Rehearsal in Continentalclusters whitepaper available at: http://www.docs.hp.com. Support for Oracle RAC Instances in a Continentalclusters Environment 131 132 3 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP The HP StorageWorks Disk Array XP Series allows you to configure data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the Continuous Access XP software and the additional files that integrate the XP with Serviceguard clusters. It then shows how to configure both metropolitan and continental cluster solutions using Continuous Access XP. The topics discussed in this chapter are: • • • • • • Files for Integrating XP Disk Arrays with Serviceguard Clusters Overview of Continuous Access XP Concepts Creating the Cluster Preparing the Cluster for Data Replication Configuring Packages for Disaster Recovery Completing and Running a Metrocluster Solution with Continuous Access XP Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323). Files for Integrating XP Disk Arrays with Serviceguard Clusters Metrocluster is a set of executable programs and an environmental file that work in an Serviceguard cluster to automate failover to alternate nodes in the case of disaster in metropolitan cluster. The Metrocluster/Continuous Access product contains the following files. Table 3-1 Metrocluster/Continuous Access Template Files Name Description /opt/cmcluster/toolkit/SGCA/xpca.env The Metrocluster/Continuous Access environmental file. This file must be customized for the specific Disk Array XP Series and HP host system configuration. Copies of this file must be customized for each separate Serviceguard package. /usr/sbin/DRCheckDiskStatus The executable module that checks for a specific environment file in the package directory and should not be edited. Files for Integrating XP Disk Arrays with Serviceguard Clusters 133 Table 3-1 Metrocluster/Continuous Access Template Files (continued) Name Description /usr/sbin/DRCheckXPCADevGrp The program that checks the status of the XP/Continuous Access device group that is used by the package. /usr/sbin/DRMonitorXPCADevGrp The program that monitors the status of the XP/Continuous Access package device group, sends notification, and performs pre-defined actions on the device group. Metrocluster/Continuous Access needs to be installed on all nodes that will run a Serviceguard package whose data are on an HP StorageWorks Disk Array XP Series, and where the data is replicated to a second XP using the Continuous Access XP facility. In the event of node failure, the integration of Metrocluster/Continuous Access with the package will allow the application to fail over in the following ways: • • among local host systems attached to the same XP Series array between one system that is attached locally to its XP and another “remote” host that is attached locally to the other XP Configuration of Metrocluster/Continuous Access must be done on all the cluster nodes, as is done for any other Serviceguard package. To use Metrocluster/Continuous Access, Raid Manager XP host-based software for control and status of the XP Series boxes must also be installed and configured on each HP 9000 or Integrity host system that might execute the application package. Overview of Continuous Access XP Concepts The HP Storage Works Disk Array XP Series may be configured for use in data replication from one XP series unit to another. This type of physical data replication is a part of the Metrocluster/Continuous Access and Continentalclusters solutions. This section describes the hardware and software concepts necessary for understanding how to use Continuous Access software for physical data replication in disaster tolerant solutions. PVOLs and SVOLs Continuous Access allows you to define primary and secondary volumes that are redundant copies of one another, as shown in Figure 3-1. 134 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Figure 3-1 XP Series Primary and Secondary Volume Definitions XP Series Array XP Series Array Redundant, Continuous Access links with DWDM BC1 PVOL SVOL BC3 In a continental cluster, Continuous Access links may be bidirectional. BC2 SVOL PVOL BC4 Optional BCs PVOLa SVOLa Optional BCs There may be multiple P/S devices Data Center A Packages with primary nodes in this data center see this XP as the primary side and the XP in Data Center B as the secondary side. Data Center B Packages with primary nodes in this data center see this XP as the primary side and the XP in Data Center A as the secondary side. Data replication proceeds from PVOL to SVOL. When failover is necessary, the SVOL can be changed into a PVOL for access by a package on the failover node. Device Groups and Fence Levels A device group is the set of XP devices that are used by a given package. The device group is the basis on which PVOLs and SVOLs are created. The fence level of the device group is set when you define it. All devices defined in a given device group must be configured with the same fence level. A fence level of DATA or NEVER results in synchronous data replication; a fence level of ASYNC is used to enable asynchronous data replication. Fence Level of NEVER Fence level = NEVER should only be used when the availability of the application is more important than the data currency on the remote XP disk array. In the case when all Continuous Access links fail, the application will continue to modify the data on PVOL side, however the new data is not replicated to the SVOL side. The SVOL only contains a copy of the data up to the point of Continuous Access links failure. If an additional failure, such as a system failure before the Continuous Access link is fixed, causes the application to fail over to the SVOL side, the application will have to deal with non-current data. If Fence level = NEVER is used, the data may be inconsistent in the case of a rolling disaster—additional failures taking place before the system has completely recovered from a previous failure. See an example of rolling disaster in the following section “Fence Level of DATA”. Overview of Continuous Access XP Concepts 135 Fence Level of DATA Fence level = DATA is recommended to ensure a current and consistent copy of the data on all sides. If Fence level = DATA is not enabled, the data may be inconsistent in the case of a rolling disaster—additional failures taking place before the system has completely recovered from a previous failure.Fence level = DATA is recommended, in case of Continuous Access link failure, to ensure there is no possibility of inconsistent data at the SVOL side. Since only dedicated Continuous Access links are supported, the probability of intermittent link failure and inconsistent data at the remote (SVOL) side is extremely low. Additionally, if the following sequence of events occur, it will cause inconsistent and therefore unusable data: • Fence level = DATA is not enabled. • The Continuous Access links fail. • The application continues to modify data. • The link is restored. • Resynchronization from PVOL to SVOL starts, but does not finish. • The PVOL side fails Although the risk of this sequence of events taking place is extremely low, if your business cannot afford a minimal level of risk, then enable Fence level = DATA to ensure that the data at the SVOL side are always consistent. The disadvantage of enabling Fence level = DATA is when the Continuous Access link fails, or if the entire remote (SVOL) data center fails, all I/Os will be refused (to those devices) until the Continuous Access link is restored, or manual intervention is used to split the PVOL side from the SVOL side. NOTE: Using manual intervention will allow the application to write the data to the PVOL side without replicating the data to SVOL side. The data may be inconsistent in the case of a rolling disaster. See the above example. Applications may fail or may continuously retry the I/Os (depending on the application) if Fence level = DATA is enabled and the Continuous Access link fails. NOTE: If data currency is required on all sides, Fence level = DATA should be used and manual intervention should NOT be taken when the Continuous Access links fail. Fence Level of ASYNC Fence level = ASYNC is recommended to improve performance in data replication between the primary and the remote site. The XP disk array supports asynchronous mode with guaranteed ordering. When the host does a write I/O to the XP disk array, as soon as the data is written to cache, the array sends a reply to the host. A copy of the data with a sequence number is saved in an internal buffer, known as the side file, for later transmission to the remote XP disk 136 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP array. When synchronous replication is used, the primary system cannot complete a transaction until a message is received acknowledging that data has been written to the remote site. With asynchronous replication, the transaction is completed once the data is written to the side file on the primary system, which allows I/O activity to continue even if the Continuous Access link is temporarily unavailable. The side file is 30% to 70% of cache (default 50%) that is assigned through the XP system’s Service Processor (SVP). The high water mark (HWM) is 30% of the cache as shown in Figure 3-2. However, if the quantity of data in the side file exceeds 30% then the write I/O to the side file will be delayed. The delay can be from .5 seconds to a maximum of 4 seconds, in 500 ms increments, with every 5% increase over the HWM. If the HWM continues to grow, it will eventually hit the side file threshold of 30% to 70% cache. When this limit has been reached, the XP on the primary site cannot write to the XP on the secondary site until there is enough room in the side file. The primary XP will wait until there is enough room in the side file before continuing to write. Furthermore, the primary XP will keep trying until it reaches its side file timeout value, which is configured through the SVP. If the side file timeout has been reached, then the primary XP disk array will begin tracking data on its bitmap that will be copied over to the secondary volume during resync. Figure 3-2 depicts the side file operation. Figure 3-2 XP Series Disk Array Side File Side File Side File Area Cache High Water Mark (30% of cache) Writing responses are delayed Writing waits until the quantity of data is under threshold unless timeout has been reached Overview of Continuous Access XP Concepts 137 NOTE: The side file must be configured using the XP Service Processor (SVP). Refer to the XP Series documentation for details. In case all the Continuous Access links fail, the remaining data in the side file that has not been copied over to the SVOL will be tracked in the bit map. The application continues to modify the data on the PVOL, which will also be tracked in the bit map. The SVOL only contains a copy of the data up to the point the failure of the Continuous Access links. If an additional failure, such as a system failure before the Continuous Access link is fixed, causes the application to fail over to the SVOL side, the application will have to deal with non-current data. Continuous Access Link Timeout In asynchronous mode, when there is an Continuous Access link failure, both the PVOL and SVOL sides change to a PSUE state. When the SVOL side detects missing data blocks from the PVOL side, it will wait for those data blocks from the PVOL side until it has reached the configured Continuous Access link timeout value (set in the SVP). Once this timeout value has been reached, then the SVOL side will change to a PSUE state. The default Continuous Access link timeout value is 5 minutes (300 seconds). Consistency Group An important property of asynchronous mode volumes is the Consistency Group (CT group). A CT group is a grouping of LUNs that need to be treated the same from the perspective of data consistency (I/O ordering). A CT group is equal to a device group in the Raid Manager configuration file. A consistency group ID (CTGID) is assigned automatically during pair creation. NOTE: Different XP model has a different maximum number of Consistency Groups. For details check the XP user’s guide. Limitations of Asynchronous Mode The following are restrictions for an asynchronous CT group in a Raid Manager configuration file: • • 138 Asynchronous device groups cannot be defined to extend across multiple XP Series disk arrays. When making paired volumes, the Raid Manager registers a CTGID to the XP Series disk array automatically at paircreate time, and the device group in the Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP • configuration file is mapped to a CTGID. Efforts to create a CTGID with a higher number will be terminated with a return value of EX_ENOCTG. Metrocluster/Continuous Access supports only one consistency group per package. This is based on the number of packages, in a metropolitan cluster, that can be configured to use a consistency group. Furthermore, the number of packages that can be configured to use a consistency group is limited by either; the maximum number of consistency groups that are supported by the XP model in the configuration, or the maximum number of packages in the cluster (whichever is smaller). Other Considerations on Asynchronous Mode The following are some additional considerations when using asynchronous mode: • • • • • When adding a new volume to an existing device group, the new volume state is SMPL. The XP disk array controller (DKC) is smart enough to do the paircreate only on the new volume. If the device group has mixed volume states like PAIR and SMPL, the pairvolchk returns EX_ENQVOL, and horctakeover will fail. If you change the LDEV number associated with a given target/LUN, you must restart all the Raid Manager instances even though the Raid Manager configuration file is not modified. Any firmware update, cache expansion, or board change, requires a restart of all Raid Manager instances. pairsplit for asynchronous mode may take a long time depending on how long the synchronization takes. there is a potential for the Continuous Access link to fail while pairsplit is in progress. If this happens, pairsplit will fail with a return code of EX_EWSUSE. In most cases, Metrocluster/Continuous Access in asynchronous mode will behave the same as when the fence level is set to NEVER in synchronous mode. Continuous Access Journal Overview Continuous Access XP Journal is an asynchronous data replication between two HP XP12000 or HP XP10000 storage disk arrays. As depicted in Figure 3-3, Continuous Access Journal uses two main features, “disk-based journaling” and “pull-style replication”. These two features reduce XP12000 internal cache memory consumption, while maintaining performance and operational resilience. Overview of Continuous Access XP Concepts 139 Figure 3-3 Journal Based Replication Continuous Access Journal performs remote copy operations for data volume pairs. Each Continuous Access Journal pair consists of primary data volumes (PVOL) and secondary data volumes (SVOL) which are located in different storage arrays. The Continuous Access Journal PVOL contains the original data, and the SVOL contains the duplicate data. During normal data replication operations, the PVOL remains available to all hosts at all times for read and write I/O operations. During normal data replication operations, the storage array rejects all host-requested write I/Os for the SVOL. The SVOL write enable option allows write access to a secondary data volume while the pair is split and uses the SVOL and PVOL track maps to resynchronize the pair. Journal Volume When Continuous Access Journal is used, updates to PVOL can be stored in other volumes, which are called journal volumes. The update data that will be stored in journal volumes are called journal data. Figure 3-3 depicts Continuous Access Journal data replication for disk-based journaling in which the data volumes at the primary data center are being replicated to a secondary storage array at the remote data center. When collecting the data to be replicated, the primary XP12000 array writes the designated records to a special set of journal volumes. The remote storage array then reads the records from the journal volumes, pulling them across the communication link as described in the next section “Pull-Based Replication”. By writing the records to journal disks instead of keeping them in cache, Continuous Access Journal overcomes the limitations of earlier asynchronous replication methods. Writes to the journal are cached for application, but are quickly de-staged to disk to minimize cache usage. The journal volumes are architected and optimized for keeping large amounts of host-write data in sequence. 140 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP In addition to the records being replicated, the journal contains metadata for each record to ensure the integrity and consistency of the replication process. Each transmitted record set includes both time stamp and sequence number information, which enables the replication process to verify that all the records are received at the remote site, and to arrange them in the correct write order for storage. These processes build on the proven algorithms of XP Continuous Access Asynchronous Data Replication. The journaling and replication processes also support consistency across multiple volumes. Pull-Based Replication In addition to disk-based journaling, Continuous Access Journal uses pull-style replication. The primary storage system does not dedicate resources to pushing data across the replication link. Rather, a replication process on the remote system pulls the data from the primary system's journal volume, across the Continuous Access link, and writes it to the journal volume at the receiving site. The replication process then applies the journaled writes to the remote data volumes, using metadata and consistency algorithms to ensure data integrity. In the default configuration, Continuous Access Journal considers replication complete when the data is received in mirrored system cache at the remote system, written to the journal disk, and applied to the remote data volumes.Since the process that controls asynchronous replication is located on the remote system, this approach shifts most of the replication workload to the remote site, reducing resource consumption on the primary storage system. In effect, Continuous Access Journal restores primary site storage to its intended role as a transaction processing resource, not a replication engine. The pull-style replication engine also contributes to resource optimization. It controls the replication process from the secondary system and frees up valuable production resources on the primary system. Mitigation of Network Problems In Continuous Access Asynchronous replication, typical issues include temporary communication problems, such as Continuous Access link failure or insufficient bandwidth for peak-load requirements. These conditions can cause cache-based “push” replication methods to fail. When this happens, traditional replication solutions suspend the replication process and go into bitmap mode, noting changed tracks in a bitmap for future resynchronization. Recovery typically involves a destructive process such as rewriting all the changed tracks, with possible loss of data consistency for ordered writes. In contrast, Continuous Access Journal logs every change to the journal disk at the primary site, including the metadata needed to apply the changes consistently. Should the replication link between sites fail, Continuous Access Journal keeps logging changes in the local journal so that they can be transmitted later, without interruption to the protection process or the application. The journal data is simply transferred after the network link failure or bandwidth limitation is corrected, with no loss of consistency. The recovery time may be extended a bit during temporary link failures or congestion, Overview of Continuous Access XP Concepts 141 but the asynchronous replication process does not fail, and the catch-up process is simple and automatic. Data consistency is preserved. With Continuous Access Journal, the remote storage system pulls data from the primary journal volumes over the data replication network as fast as the bandwidth allows while adjusting to available network conditions. If available bandwidth does not support optimal replication, such as during peak-load spikes in transaction volume, the primary journal volumes buffer the data on disk until more bandwidth becomes available. Fence Level The Continuous Access Journal has the asynchronous data replication characteristic. In XP12000, the fence level of the Continuous Access journal is defined to “async”, the same as the Continuous Access Asynchronous fence level. Journal Group The journal group is a component of the Continuous Access Journal operations that consists of two or more data and journal volumes. The data update sequence from the host is managed per the journal group. This ensures the data update sequence consistency between the paired journal groups is maintained. Journal groups are managed according to the journal group number. The paired journal numbers of journal groups can be different. One journal group can have more than one data volume and journal volume belong to it. Journal Cache, Journal Volumes, and Inflow Control When a primary array performs an update (host-requested write I/O) on PVOL, the primary array creates the journal data (metadata and new write data) to be transferred to secondary array. The journal data is stored in the journal cache or journal volumes depending on an amount of data in cache. If available cache memory for Continuous Access Journal is low, the journal data is stored in the journal volumes. A secondary array receives the journal data that is transferred from the primary array according to the read journal command. The received journal data is stored in the journal cache or the journal volumes depending on the “Use of Cache” parameter and/or amount of data in cache. If the “Use of Cache” is set to “Use”, journal data will be stored into the journal cache. If it is set to “No Use”, journal data will bypass the cache and move directly to the journal volumes. In addition, if available cache memory for Continuous Access Journal is low, the journal data is stored in the journal volume. For Continuous Access Journal processing, Continuous Access Journal allows the usage rate of journal volume to be specified. The Journal volume stores journal data to be transferred to the secondary array asynchronously using host write I/Os to PVOL. However, if the hosts transfer excessive amounts of data, the journal volume may become full. Consequently, if the journal volumes remains full for the specified period of time, the journal group will be suspended due to a failure. To specify the period of 142 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP time for how long the journal volume can remain full, use the Data Overflow Watch option. The XP12000 array uses the following parameters to control the inflow of data into journal group and state change of the journal group: • Inflow Control: Indicates whether to restrict inflow of update I/Os (slow down the host response) to the journal volume, when the journal volume is full. ‘Yes’ indicates inflow will be restricted. ‘No’ indicates inflow will not be restricted. • Data Overflow Watch: Indicates the time (in seconds) to implement Inflow Control before suspending the journal group. If the amount of data in the journal volume, in the primary array, reaches the capacity, the disk array I/Os will be delayed. If journal volume remains full for the period of time specified by the Data Overflow Watch parameter, the primary array suspends the affected journal groups due to a failure. Continuous Access Journal Pair State If the amount of data in the journal cache, in the secondary subsystem, reaches the specified journal cache capacity, the secondary subsystem stores the received journal data into the restore journal volume, and then issues the next read-journal command to the primary subsystem. This suppresses the cache usage rate increase. To accommodate, the Continuous Access Journal retains the PAIR state when the Continuous Access links fail while the Continuous Access Asynchronous switches to PSUE state as long as the journal volumes has enough space. In addition, this allows host write-data to be kept continuously as journal data in the journal volumes while the updated data is not being replicating to the remote array. Once the links are recovered, the data replication of the primary and secondary arrays is resumed automatically. The journal data accumulated in the primary journal volumes is replicated to the secondary site automatically. NOTE: If the journal volumes get full, the pair state will be switched to PFUS and the data written to the data volume is tracked in bitmap. Limitations of XP12000 Continuous Access Journal The following two sections describe the “One-to-One Volume Copy Operations” and “One-to-One Journal Group Operations” limitations of the XP12000 Continuous Access Journal. One-to-One Volume Copy Operations Continuous Access Journal requires a one-to-one relationship between the logical volumes of the volume pairs. A volume can only be assigned to one journal group pair at a time. Overview of Continuous Access XP Concepts 143 NOTE: Continuous Access Journal does not support operations in which one primary data volume is copied to more than one secondary data volume, or more than one primary data volume is copied to one secondary data volume. One-to-One Journal Group Operations The Continuous Access Journal supported configuration for a journal group pair is a one-to-one relationship. This means one journal group in one XP12000 can only pair with one journal group in another XP12000. Journal Group Requirement The journal groups require that each data volume pair be assigned to one and only one journal group. Configuring XP12000 Continuous Access Journal One journal group can contain multiple journal volumes. Each of the journal volumes can have different volume sizes and different RAID configurations. Journal data will be stored sequentially and separately into each journal volume in the same journal group, and each of the journal volumes that are used equally. Journal volumes in the same journal group can be of different capacity. A journal volume in primary subsystem and the corresponding restore journal volume can be of different capacity. Registering Journal Volumes Unlike the Continuous Access Asynchronous device group that only contains data volumes, a journal group includes data volumes as well as journal volumes. Journal volumes must be registered in a journal group before creating a data volume pair for the first time in the journal group. Journal volumes are assigned to a specific journal group. Each journal group has it own ID. The journal volumes assigned to the specific journal group can be used to create one journal group pair. One journal group (JID) on primary array and one journal group (JID) on secondary array are used to create a journal group pair. Be sure to register journal volumes to journal groups on both primary and secondary arrays. The number and capacity of the journal volumes for a specific journal group on a primary or secondary array depends on the business need and IT infrastructure. To register journal volumes in a journal group use the “HP StorageWorks Command View XP”. For more information on this feature, refer to the HP-UX 11i Version 2 Release Notes. 144 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP NOTE: The “HP StorageWorks Command View XP” utility is the only way to register journal volumes. (No command line interface is available for registering the journal volumes). Journal volumes can be registered in a journal group or can be deleted from a Journal group. Journal volumes cannot be registered or deleted when data copying is performed (that is, when one or more data volume pairs exist). The journal volumes can be deleted from a journal group in the following occasions: • When the journal group does not contain data volumes (that is, before a data volume pair is created for the first time in the journal group, or after all data volume pairs are deleted). • When all data volume pairs in the journal group are suspended. If a path is defined from a host to a volume, do not register the volume as a journal volume and define paths from hosts to journal volumes. This means that hosts cannot read from and write to journal volumes. Data Replication Connections The remote copy connections are the physical paths used by the primary array to communicate with the secondary array. The primary XP12000 array and secondary XP12000 array are connected using fiber-channel interface (Note: ESCON is not supported with the XP12000). Ensure the connection is established in a bidirectional manner. Metrocluster package vs. Journal Group Metrocluster Continuous Access XP supports only one journal group pair per package. Thus, in a metropolitan cluster, the number of packages can be configured to use journal group is limited by either the maximum number of journal groups that are supported by the XP12000 in the configuration, or by the maximum number of packages in the cluster, which ever is smaller. Creating the Cluster Create the cluster or clusters according to the process described in the Managing Serviceguard user’s guide. In the case of a metropolitan cluster, create a single Serviceguard cluster with components on multiple sites. In the case of a continental cluster, create two distinct Serviceguard clusters on different sites. Creating the Cluster 145 NOTE: Do not configure an XP series paired volume, PVOL or SVOL, as a cluster lock disk. A cluster lock disk must always be writable. Since it cannot be guaranteed that either half of a paired volume is always writable, neither half may be used as a cluster lock disk. A configuration with a cluster lock disk that is part of a paired volume is not a supported configuration. Preparing the Cluster for Data Replication This section assumes that you have already created one or more Serviceguard clusters for use in a disaster tolerant configuration. The following three sets of procedures will prepare Serviceguard clusters for use with Continuous Access XP data replication in a metropolitan or continental cluster. • Creating the RAID Manager Configuration • Defining Storage Units • Configuring Packages Creating the RAID Manager Configuration Use these steps to create the configuration: 1. Ensure that the XP Series disk arrays are correctly cabled to each host system that will run packages whose data reside on the arrays. Each XP Series disk array must be configured with redundant Continuous Access links, each of which is connected to a different LCP or RCP card. To prevent a single point of failure (SPOF), there must be at least two physical boards in each XP for the Continuous Access links. Each board usually has multiple ports. However, a redundant Continuous Access link must be connected to a port on a different physical board from the board that has the primary Continuous Access link. When using bi-directional configurations, where data center A backs up data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which you want to allow failback. 2. 3. Install the Raid Manager XP software on each host system that has data residing on the XP disk arrays. Edit the /etc/services file, adding an entry for the Raid Manager instance to be used with the cluster. The format of the entry is: horcm /udp For example: horcm0 146 11000/udp #Raid Manager instance 0 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 4. Use the ioscan command to determine what devices on the XP disk array have been configured as command devices. The device-specific information in the right most column of the ioscan output will have the suffix-CM for these devices; for example, OPEN-3-CM. If there are no configured command devices on the disk array, you must create two before proceeding. Each command device must have alternate links (PVLinks). The first command device is the primary command device. The second command device is a redundant command device and is used only upon failure of the primary command device. The command devices must be mapped to the various host interfaces by using the SVP (disk array console) or a remote console. 5. Copy the default Raid Manager configuration file to an instance-specific name. # cp /etc/horcm.conf /etc/horcm0.conf 6. Create a minimum Raid Manager configuration file by editing the following fields in the file created in the previous step: • HORCM_MON—enter the host-name of the system on which you are editing and the TCP/IP port number specified for this Raid Manager instance in the /etc/services file. • HORCM_CMD—enter the primary and alternate link device file names for both the primary and redundant command devices (for a total of four raw device file names). CAUTION: Make sure that the redundant command device is NOT on the same physical device as the primary command device. Also, make sure that it is on a different bus inside the XP series disk array. 7. If the Raid Manager protection facility is enabled, set the HORCPERM environment variable to the pathname of the HORCM permission file, then export the variable. # export HORCMPERM=/etc/horcmperm0.conf If the Raid Manager protection facility is not used or disabled, export the HORCPERM environment variable. # export HORCMPERM=MGRNOINST 8. Start the Raid Manager instance by using horcmstart.sh . # horcmstart.sh 0 9. Export the environment variable that specifies the Raid Manager instance to be used by the Raid Manager commands. For example, with the POSIX shell type. # export HORCMINST=0 Now, use Raid Manager commands to get further information from the disk arrays. To verify the software revision of the Raid Manager and the firmware revision of the XP disk array. Preparing the Cluster for Data Replication 147 # raidqry -l NOTE: Check for the minimum requirement level for XP, Raid Manager software, and firmware for your version listed in the Metrocluster Continuous Access XP Release Notes. Identifying HP-UX device files Before you create volume groups, you must determine the Device Special Files (DSFs) of the corresponding LUNs used in the XP array. To determine the legacy DSFs corresponding to the LUNs in the XP array: # ls /dev/rdsk/* | raidscan -find -fx Following is the output that is displayed: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdsk/c5t0d0 0 F CL3-E 0 0 10053 321 OPEN-3 This output displays the mapping between the legacy DSFs and the CU:LDEVs. In this output the value for LDEV specifies the CU:LDEV without the : mark. To determine the agile DSFs that are supported from HP-UX 11i v3 and CU:LDEV mapping information run the following command: # ls /dev/rdisk/* | raidscan -find -fx Following is the output that you will see: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdisk/disk232 0 F CL4-E 0 0 10053 321 OPEN-3 NOTE: There must also be alternate links for each device, and these alternate links must be on different busses inside the XP disk array. For example, these alternate links may be CL2-E and CL2-F. Unless the devices have been previously paired either on this or another host, the devices will show up as SMPL (simplex). Paired devices will show up as PVOL (primary volume) or SVOL (secondary volume). XP arrays (XP 10000/XP 12000 and beyond) support external attached storage devices to be configured as either P-VOL or S-VOL or both of a Continuous Access pair. From a Continuous Access perspective, there is no difference between a pair created from internal devices and a pair created on external devices. Refer to the HP StorageWorks XP documentation for information on the configuration requirements of external storage devices attached to XP arrays and supported external storage devices. 10. Determine which devices will be used by the application package. Define a device group that contains all of these devices. It is recommended that you use a name that is easily associated with the package. For example, a device group name of 148 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP “db-payroll” is easily associated with the database for the payroll application. A device group name of “group1” would be more difficult to relate to an application. Edit the Raid Manager configuration file (horcm0.conf) in the above example to include the devices and device group used by the application package. Only one device group may be specified for all of the devices that belong to a single application package. These devices are specified in the field HORCM_DEV. Also complete the HORCM_INST field, supplying the names of only those hosts that are attached to the XP disk array that is remote from the disk array directly attached to this host. For example, with the continental cluster shown in Figure 3-4 (node 1 and node 2 in the primary cluster and nodes 3 and node 4 in the recovery cluster), you would specify only nodes 3 and node 4 in the HORCM_INST field in a file you are creating on node 1 on the primary cluster. Node 1 would have previously been specified in the HORCM_MON field. Figure 3-4 Disaster Tolerant Cluster replicated data for package A PVOL PVOL Local XP Disk Array PVlinks A pkg A Remote XP Continuous Access Disk Array PVlinks A’ link node 1 node 2 pkg B SVOL SVOL replicated data for package B network network Highly Available Network network PVlinks B node 3 network Continuous Access link PVlinks node 4 B’ 11. Restart the Raid Manager instance so that the new information in the configuration file is read. # horcmshutdown.sh # horcmstart.sh 12. Repeat these steps on each host that will run this particular application package. If a host may run more than one application package, you must incorporate device group and host information for each of these packages. Note that the Raid Manager configuration file must be different for each host, especially for the HORC_MON and HORC_INST fields. 13. If not previously done, create the paired volumes. # paircreate -g -f -vl -c15 Preparing the Cluster for Data Replication 149 This command must be issued before creating volume groups. For creating a pair of journal groups, refer to the next section, “Pair Creation of Journal Groups”. CAUTION: Paired devices must be of compatible sizes and types. When using the paircreate command to create PVOL/SVOL Continuous Access pairs, specify the -c 15 switch to ensure the fastest data copy from PVOL to SVOL. Pair Creation of Journal Groups The Continuous Access Journal has the same characteristic as Continuous Access Asynchronous such that Raid Manager controls Continuous Access Journal similar to Continuous Access Asynchronous. The Raid Manager configuration of the device group pair for Continuous Access Journal is exactly the same as the configuration of the Continuous Access Asynchronous device group pair. In the /etc/horcm0.conf file, do not specify any journal volumes or journal group number. Only data volumes (device group and it’s devices) need to be in the configuration file. Creating Continuous Access Journal Pair To create a journal group pair, use the paircreate command. # paircreate -g -f async -vl -c 15 -jp \ -js Similar to Continuous Access Asynchronous, the fence “async” must be assigned to the command with two additional options -jp and -js. -jp : This option is to specify a journal group ID for PVOL -js : This option is to specify a journal group ID for SVOL The -jp and -js options are required if the device group is configured to use Continuous Access Journal. The used with -jp and -js option do not need to be the same. Sample Raid Manager Configuration File The following is an example of a Raid Manager configuration file for one node (ftsys1). ## horcm0.conf.ftsys1- This is an example Raid Manager configuration file for node ftsys1.Note that this configuration file is for Raid Manager instance 0, which can be determined by the "0" in the filename "horcm0.conf". #Whenever this configuration file is changed, you must stop and restart the instance of Raid Manager before the changes will be recognized. This can be done using the following commands:# 150 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP horcmshutdown.sh # horcmstart.sh # After restarting the Raid Manager instance, you should confirm that there are no configuration errors reported by running the pairdisplay command with the "-c" option. # # NOTE: The Raid Manager command device (RORCM_CMD) cannot be used for # data storage (it is reserved for private Raid Manager usage). #/************************ HORCM_MON *************************************/ # # The HORCM_MON parameter is used for monitoring and control of device groups # by the Raid Manager. # It is used to define the IP address, port number, and paired volume error # monitoring interval for the local host. # # Defines a network address used by the local host. This can be a host name # or an IP address. # # Specifies the port name assigned to the Raid Manager communication path, # which is must also be defined in /etc/services. If a port number, rather # than a port name is specified, the port number will be used. # # Specifies the interval used for monitoring the paired volumes. By # increasing this interval, the Raid Manager daemon load is reduced. # If this interval is set to -1, the paired volumes are not monitored. # # Specifies the time-out period for communication with the Raid Manager # server. HORCM_MON #ip_address ftsys1 service horcm0 poll_interval(10ms) 1000 timeout(10ms) 3000 #/************************* HORCM_CMD *************************************/ # # The HORCM_CMD parameter is used to define the special files (raw device # file names) of the Raid Manager command devices used for the monitoring # and control of Raid Manager device groups. # Define the special device files corresponding to two or more command devices # in order to use the Raid Manager alternate command device feature. An # alternate command device must be configured, otherwise a failure of a # single command device could prevent access to the device group. # Each command device must have alternate links (PVLinks). The first command # device is the primary command device. The second command device is a # redundant command device and is used only upon failure of the primary # command device. The command devices must be mapped to the various host # interfaces by using the SVP (disk array console) or a remote console. HORCM_CMD #Primary #dev_name /dev/rdsk/c4t1d0 Primary Alt-Link dev_name /dev/rdsk/c5t1d0 Secondary dev_name /dev/rdsk/c4t0d1 Secondary Alt-link dev_name /dev/rdsk/c5t0d1 #/************************* HORCM_DEV *************************************/ # # The HORCM_DEV parameter is used to define the addresses of the physical # volumes corresponding to the paired logical volume names. Each group # name is a unique name used by the hosts which will access the volumes. # # The group and paired logical volume names defined here must be the same for # all other (remote) hosts that will access this device group. Preparing the Cluster for Data Replication 151 # # # # # # # # # # # # # # # # # # # # The hardware SCSI bus, SCSI-ID, and LUNs for the device groups do not need to be the same on remote hosts. This parameter is used to define the device group name for paired logical volumes. The device group name is used by all Raid Manager commands for accessing these paired logical volumes. This parameter is used to define the names of the paired logical volumes in the device group. This parameter is used to define the XP256 port number used to access the physical volumes in the XP256 connected to the "dev_name". Consult your XP256 for valid Port numbers to specify here. This parameter is used to define the SCSI target ID of the physical volume on the port specified in "port#". This parameter is used to define the SCSI logical unit number (LUN) of the physical volume specified in "targetID". HORCM_DEV #dev_group pkgA pkgA pkgA pkgB pkgC pkgD dev_name pkgA_index pkgA_tables pkgA_logs pkgB_d1 pkgC_d1 pkgD_d1 port# CL1-E CL1-E CL1-E CL1-E CL1-E CL1-E TargetID 0 0 0 0 0 0 LUN# 1 2 3 4 5 2 #/************************* HORCM_INST ************************************/ # # This parameter is used to define the network address (IP address or host # name) of the remote hosts which can provide the remote Raid Manager access # for each of the device group secondary volumes. # The remote Raid Manager instances are required to get status or provide # control of the remote devices in the device group. All remote hosts # must be defined here, so that the failure of one remote host will prevent # obtaining status. # # # This is the same device group names as defined in dev_group of HORC_DEV. # # This parameter is used to define the network address of the remote hosts # with Raid Manager access to the device group. This can be either an # IP address or a host name. # # This parameter is used to specify the port name assigned to the Raid # Manager instance, which must be registered in /etc/services. If this is # a port number rather than a port name, then the port number will be used. HORCM_INST #dev_group pkgA pkgA pkgB pkgB pkgC pkgC pkgD pkgD 152 ip_address ftsys1a ftsys2a ftsys1a ftsys2a ftsys1a ftsys2a ftsys1a ftsys2a service horcm0 horcm0 horcm0 horcm0 horcm0 horcm0 horcm0 horcm0 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Notes on the Raid Manager Configuration A single XP device group must be defined for each package on each host that is connected to the XP series disk array. Device groups are defined in the Raid Manager configuration file under the heading HORCM_DEV. The disk target IDs and LUNs for all Physical Volumes (PVs) defined in Volume Groups (VGs) that belong to the package must be defined in one XP device group on each host system that may ever run one or more Continentalclusters packages. The device group name (dev_group) is user-defined and must be the same on each host in the continental cluster that accesses the XP disk array. The device group name (dev_group) must be unique within the cluster; it should be a name that is easily associated with the application name or Serviceguard package name. The TargetID and LU# fields for each device name may be different on different hosts in the clusters, to allow for different hardware I/O paths on different hosts. See the sample convenience scripts in the Samples directory included with this toolkit for examples. Configuring Automatic Raid Manager Startup After editing the Raid Manager configuration files and installing them on the nodes that are attached to the XP Series disk arrays, you should configure automatic Raid Manager startup on the same nodes. This is done by editing the rc script /etc/ rc.config.d/raidmgr. Set the START_RAIDMGR parameter to 1, and define RAIDMGR_INSTANCE as the number of the Raid Manager instances being used. By default, this is zero (0). An example of the edited startup file is shown below: #*************************** RAIDMANAGER ************************* # # # # # # # # # # # # # # # # # # # # # # Metrocluster with Continuous Access Toolkit script for configuring the startup parameters for a HP StorageWorks Disk Array XP Raid Manager instance. The Raid Manager instance must be running before any Metrocluster package can start up successfully. @(#) $Revision: 1.8 $ START_RAIDMGR: If set to 1, this host will attempt to start up an instance of the Disk Array XP Raid Manager, which must be running before a Metrocluster package can be successfully started. If set to 0, this host will not attempt to start the Raid Manager. RAIDMGR_INSTANCE This is the instance number of the Raid Manager instance to be started by this script. The instance number specified here must be the same as the instance number specified in the Metrocluster package control script. Consult your Raid Manager documentation for more information on Raid Manager instances. See the Metrocluster and Raid Manager documentation for more information Preparing the Cluster for Data Replication 153 # on configuring this script. # START_RAIDMGR=0 RAIDMGR_INSTANCE=0 Defining Storage Units Both LVM and VERITAS VxVM storage can be used in disaster tolerant clusters. The following sections show how to set up each type of volume group: Creating and Exporting LVM Volume Groups using Continuous Access XP Use the following procedure to create and export volume groups: 1. Define the appropriate Volume Groups on each host system that might run the application package. # mkdir /dev/vgxx # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the entire cluster. 2. Create the Volume Group on the source volumes. # pvcreate -f /dev/rdsk/cxtydz # vgcreate /dev/vgname /dev/dsk/cxtydz 3. 4. Create the logical volume(s) for the volume group. Export the Volume Groups on the primary system without removing the special device files. # vgchange -a n Make sure that you copy the mapfiles to all of the host systems. # vgexport -s -p -m 5. On the primary cluster import the VGs on all of the other systems that might run the Serviceguard package and backup the LVM configuration. # vgimport -s -m # vgchange -a y # vgcfgbackup # vgchange -a n 6. On the recovery cluster import the VGs on all of the systems that might run the Serviceguard recovery package and backup the LVM configuration. # pairsplit -g -rw # vgimport -s -m # vgchange -a y 154 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP # vgcfgbackup # vgchange -a n # pairresync -g -c 15 Skip the pairsplit/pairresync, however this will not activate the volume group to perform the vgcfgbackup. Perform the vgcfgbackup when the volume group is activated during the first recovery package activation. When using the pairresync command to resynchronize PVOL/SVOL Continuous Access pairs, specify the -c 15 switch to ensure the fastest resynchronization which reduces the vulnerability of a rolling disaster. From HP-UX 11i v3 onwards, HP recommends that you use agile DSF naming model for mass storage devices. For more information on the agile view, mass storage on HP-UX, DSF migration and LVM Online Disk Replacement, see the following documents that are available at http://www.docs.hp.com: • • • • LVM Migration from Legacy to Agile Naming Model HP-UX 11i v3 Mass Storage Device Naming Serviceguard Migration LVM Online Disk Replacement Creating VxVM Disk Groups using Continuous Access XP If using VERITAS storage, use the following procedure to create disk groups. It is assumed a VERITAS root disk (rootdg) has already been created on the system where configuring the storage. The following section shows how to set up VERITAS disk groups. On one node do the following: 1. Create the device pair to be used by the package. # paircreate -g devgrpA -f never -vl -c 15 2. Check to make sure the devices are in the PAIR state. # pairdisplay -g devgrpA 3. Initialize disks to be used with VxVM by running the vxdisksetup command only on the primary system. # /etc/vx/bin/vxdisksetup -i c5t0d0 4. Create the disk group to be used with the vxdg command only on the primary system. # vxdg init logdata c5t0d0 5. Verify the configuration. # vxdg list Preparing the Cluster for Data Replication 155 6. Use the vxassist command to create the logical volume. # vxassist -g logdata make logfile 2048m 7. Verify the configuration. # vxprint -g logdata 8. Make the filesystem. # newfs -F vxfs /dev/vx/rdsk/logdata/logfile 9. Create a directory to mount the volume group. # mkdir /logs 10. Mount the volume group. # mount /dev/vx/dsk/logdata/logfile /logs 11. Check if file system exits, then unmount the file system. # umount /logs IMPORTANT: VxVM 4.1 does not support the agile DSF naming convention with HP-UX 11i v3. Validating VxVM Disk Groups using Metrocluster/Continuous Access Data Replication The following section describes how to validate the VERITAS disk groups on one node: 1. Deport the disk group. # vxdg deport logdata 2. Enable other cluster nodes to have access to the disk group. # vxdctl enable 3. Suspend the Continuous Access link and have SVOL Read/Write permission. # pairsplit -g devgrpA -rw 4. Import the disk group. # vxdg -tfC import logdata 5. Start the logical volume in the disk group. # vxvol -g logdata startall 6. Create a directory to mount the volume. # mkdir /logs 7. Mount the volume. # mount /dev/vx/dsk/logdata/logfile /logs 156 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 8. Check to make sure the file system is present, then unmount the file system. # umount /logs 9. Resynchronize the Continuous Access pair device. # pairresync -g devicegroupname -c 15 Configuring Packages for Disaster Recovery When you have completed the following steps, packages will be able to fail over to an alternate node in another data center and still have access to the data that they need in order to operate. This procedure must be repeated on all the cluster nodes for each Serviceguard package so the application can fail over to any of the nodes in the cluster. Customizations include editing an environment file to set environment variables, and customizing the package control script to include customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Consult the Managing Serviceguard user’s guide for more detailed instructions on how to start, halt, and move packages and their services between nodes in a cluster. For ease of troubleshooting, you can configure and test one package at a time. 1. Create a directory /etc/cmcluster/pkgname for each package. # mkdir /etc/cmcluster/pkgname 2. Create a package configuration file. # cd /etc/cmcluster/pkgname # cmmakepkg -p pkgname.config Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ pkgname.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. 3. In the .config file, list the node names in the order in which you want the package to fail over. It is recommended for performance reasons, to have the package fail over locally first, then to the remote data center. Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time required to obtain status from the XP Series disk array. If you are using a fence level of ASYNC, then the RUN_SCRIPT_TIMEOUT should be greater than the value of HORCTIMEOUT in the package environment file (see step 7g below). Configuring Packages for Disaster Recovery 157 NOTE: If you are using the EMS disk monitor as a package resource, you must not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is no access from the host to the package disks. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the XP Series disk array. Clusters with multiple packages that use devices on the XP Series disk array will all cause package startup time to increase when more than one package is starting at the same time. 4. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. 5. 6. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See Managing Serviceguard for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env. # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env NOTE: If you do not use a package name as a filename for the package control script, you must follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (xpca) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_xpca.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_xpca.env. 7. 158 Edit the environment file _xpca.env as follows: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_. It is recommended that you retain the default values of these variables unless you have a specific business requirement to change them. See Appendix A for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. d. Uncomment the DEVICE_GROUP variable and set it to this package’s Raid Manager device group name, as specified in the Raid Manager configuration file. e. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used or disabled. If Raid Manager protection facility is enabled set it to the name of the HORCM permission file. f. Uncomment the HORCMINST variable and set it to the Raid Manager instance name used by Metrocluster/Continuous Access. g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array. h. If you are using asynchronous data replication, set the HORCTIMEOUTvariable to a value greater than the side file timeout value configured with the Service Processor (SVP), but less than the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default setting is the side file timeout value + 60 seconds. i. Uncomment the CLUSTER_TYPE variable and set it to METRO if you are using Metrocluster, or CONTINENTAL if you are using Continentalclusters. 8. After customizing the control script file and creating the environment file, and before starting up the package, do a syntax check on the control script using the following command (be sure to include the -n option to perform syntax checking only): # sh -n If any messages are returned, you should correct the syntax errors. 9. Check the configuration using the cmcheckconf -P pkgname.config, then apply the Serviceguard configuration using the cmapplyconf -P pkgname.config command or SAM. Configuring Packages for Disaster Recovery 159 10. Distribute Metrocluster/Continuous Access configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via the .rhosts may create a security issue. 11. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname. pkgname.cntl Seviceguard package control script pkgname_xpca.env Metrocluster/Continuous Access environment file pkgname.config Serviceguard package ASCII configuration file pkgname.sh Package monitor shell script, if applicable other files Any other scripts you use to manage Serviceguard packages The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster/Continuous Access. Completing and Running a Metrocluster Solution with Continuous Access XP No additional steps are required after cluster and package configuration to complete the setup of the metropolitan cluster. In normal operation, the metropolitan cluster with Continuous Access XP starts like any other cluster, and runs and halts packages in the same way as a standard cluster. However, startup time for packages may be considerably slower because of the need to check disk status on both disk arrays. Maintaining a Cluster that uses Metrocluster with Continuous Access XP While the cluster is running, performing manual “changes of state” for devices on the XP Series disk array can cause the package to halt. This is due to unexpected conditions and can cause the package not to start up after a failover. In general, it is recommended that no manual “changes of state” be performed while the package and the cluster are running. 160 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP NOTE: Manual changes can be made when they are required to bring the device group into a “protected” state. For example, if a package starts up with data replication suspended, a user can perform a pairresync command to re-establish data replication while the package is still running. Viewing the Progress of Copy Operations While a copy is in progress between XP systems (that is, the volumes are in a COPY state), the progress of the copy can be viewed by monitoring the % column in the output of the pairdisplay command: # pairdisplay -g pkgB -fc -CLI Group pkgB pkgB PairVol L/R pkgD-disk0 L pkgD-disk0 R Port# TID LU Seq# LDEV# P/S Status Fence CL1-C 0 3 35422 463 P-VOL COPY NEVER CL1-F 0 3 35663 3 S-VOL COPY NEVER % P-LDEV# M 79 460 0 - This display shows that 79% of a current copy operation has completed. Synchronous fence levels (NEVER and DATA) show 100% in this column when the volumes are in a PAIR state. Viewing Side File Size If you are using asynchronous data replication, you can see the current size of the side file when the volumes are in a PAIR state by using the pairdisplay command. The following output, obtained during normal cluster operation, shows the percentage of the side file that is full: # pairdisplay -g pkgB -fc -CLI Group pkgB pkgB PairVol L/R pkgD-disk0 L pkgD-disk0 R Port# TID LU Seq# LDEV# P/S Status Fence CL1-C 0 3 35422 463 P-VOL PAIR ASYNC CL1-F 0 3 35663 3 S-VOL PAIR ASYNC % P-LDEV# M 35 3 0 463 - This output shows that 35% of the side file is full. When volumes are in a COPY state, the % column shows the progress of the copying between the XP frames, until it reaches 100%, at which point the display reverts to showing the side file usage in the PAIR state. Viewing the Continuous Access Journal Status The following two sections describe using the pairdisplay and raidvchkscan commands for viewing the Continuous Access Journal Status. Viewing the Pair and Journal Group Information - Raid Manager using the “pairdisplay” Command The command option “-fe” is added to the Raid Manager pairdisplay command. This option is used to display the Journal Group ID (and other data) of a device group pair. The Journal Group ID shows ‘-’ if the device pair is not in Continuous Access Journal mode. Otherwise, it shows a number. Completing and Running a Metrocluster Solution with Continuous Access XP 161 An example of the pairdisplay command with the “-fe” is as below: The pairdisplay -fe is primarily used for the following: Continuous Access Journal device group consistency set, Journal group ID (JID), and Continuous Access link status (AP). # pairdisplay -g oradb -fe Group Seq#, LDEV# P/S,Status, Fence, %, P-LDEV# M CTG JID oradb 30053 64 P-VOL PAIR Never, 75 C8 1 oradb 30054 C8 S-VOL PAIR Never, 64 1 AP EM E-Seq# 2 0 E-LDEV# 0 Viewing the Journal Volumes Information - Raid Manager using the “raidvchkscan” Command The raidvchkscancommand supports the option (-v jnl [unit#]) in order to find the journal volume lists, and displays information for the journal volumes. raidvchkscan { -h -q -z -v jnl [unit#] [ -s Seq# ] [ -f[x ] | } An example of the raidvchkscan command is as follows: # raidvchkscan –v jnl 0 JID MU CTG JNLS AP U(%) 001 0 1 PJNN 4 21 002 1 2 PJNF 4 95 003 0 3 PJSN 4 0 004 0 4 PJSF 4 45 005 0 5 PJSE 0 0 006 - SMPL 007 0 6 SMPL 4 5 Q-Marker 43216fde 3459fd43 1234f432 345678ef Q-CNT 30 52000 78 66 D-SZ(BLK) Seq# Nnm LDEV# 512345 62500 2 265 512345 62500 3 270 512345 62500 1 275 512345 62500 1 276 512345 62500 1 277 512345 62500 1 278 512345 62500 1 278 Figure 3-5 shows the illustration for Q-Marker and Q-CNT. The following terms define the meaning for contents in the figure. • • • • JID: Displays the journal group ID. MU: Displays the mirror descriptions on the journal group. CTG: Displays the consistency group ID. JNLS: Displays the following status in the journal group. — SMPL: this means the journal volume is no in pair mode or is in deleting state. — P(S)JNN: this means “P(S)vol Journal Normal” — P(S)JSN: this means “P(S)vol Journal Suspend Normal” — PJNF: this means “P(S)vol Journal Normal Full” — P(S)JSF: this means “P(S)vol Journal Suspend Full” — P(S)JSE: this means “P(S)vol Journal Suspend Error” including Link failure • AP: shows the number of active path on the initiator port in Continuous Access links. Q-Marker: Displays the sequence number in the journal group. In case of the P-JNL, Q-Marker shows the latest sequence number on P-JNL volume. In case of the S-JNL, Q-Marker shows the latest sequence number putting on the cache. Q-CNT: Displays the number of remaining Q-Marker of a journal group. • • • • 162 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Figure 3-5 Q-Marker and Q-CNT Q-Marker (#9) of P-JNL Q-Marker (#2) of S-JNL R/W P-JNL S-JNL 9 8 7 6 5 4 7 6 5 4 3 3 P-VOL • • • • • Q-CNT Asynchronous transfer Q-CNT S-VOL U(%): Displays the usage rate of the journal data. D-SZ: Displays the capacity for the journal data on the journal group. Seq#: Displays the serial number of the XP12000. Num: Displays the number of LDEV (journal volumes) configured for the journal group. LDEV#: Displays the first LDEV number of journal volumes. Normal Maintenance There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Metrocluster/Continuous Access: 1. Stop the package with the appropriate Serviceguard command. # cmhaltpkg pkgname 2. Split links for the package. # pairsplit -g -rw 3. Distribute the Metrocluster with Continuous Access XP configuration changes. # cmapplyconf -P pkgname.config 4. Start the package with the appropriate Serviceguard command: # cmmodpkg -e pkgname Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators” (page 31). Completing and Running a Metrocluster Solution with Continuous Access XP 163 Resynchronizing After certain failures, data is no longer remotely protected. In order to restore disaster tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored. Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection: • Failure of all Continuous Access links without restart of the application • Failure of all Continuous Access links with Fence Level “DATA” with restart of the application on a primary host • Failure of the entire secondary Data Center for a given application package • Failure of the secondary XP Series disk array for a given application package while the application is running on a primary host Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated for these failures by moving the application package back to its primary host after repairing the failure: • Failure of the entire primary data center for a given application package • Failure of all of the primary hosts for a given application package • Failure of the primary XP Series disk array for a given application package • Failure of all Continuous Access links with restart of the application on a secondary host Pairs must be manually recreated if both the primary and secondary XP Series disk array are in SMPL (simplex) state. Make sure you periodically review the files syslog.log and /etc/cmcluster/pkgname/pkgname.log for messages, warnings and recommended actions. It is recommended to review these files after system, data center, or application failures. Full resynchronization must be manually initiated after repairing the following failures: • Failure of the secondary XP Series disk array for a given application package followed by application startup on a primary host • Failure of all Continuous Access links with Fence Level NEVER and ASYNC with restart of the application on a primary host Using the pairresync Command The pairresync command can be used with special options; after a failover in which the recovery site has started the application, and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the Continuous Access link is fixed, use the pairresync command in one of the following two ways depending on which site you are on: 164 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP • • pairresync -swapp—from the primary site. pairresync -swaps—from the failover site. These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL (now running on the primary site). During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site. NOTE: The preceding steps are automated provided the default value of 1 is being used for the auto variable AUTO_PSUEPSUS. Once the Continuous Access link failure has been fixed, the user only needs to halt the package on the recovery cluster and restart on the primary cluster. However, if you want to reduce the amount of application downtime, you should manually invoke pairresync before failback. Failback After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site. Timing Considerations In a journal group, many journal volumes can be configured to hold a significant amount of the journal data (host-write data). The package startup time may increase significantly when a Metrocluster Continuous Access package fails over. Delay in package startup time will occur in these situations: 1. When recovering from broken pair affinity. On failover, the SVOL pull all the journal data from PVOL site. The time needed to complete all data transfer to SVOL depends on the amount of outstanding journal data in the PVOL and the bandwidth of the Continuous Access links. 2. When host I/O faster than Continuous Access data replication. The outstanding data not being replicated to the SVOL is accumulated in journal volumes. Upon package fail over to the SVOL site, the SVOL pull all the journal data from PVOL site. The completion of the all data transfer to the SVOL depends on the bandwidth of the Continuous Access links and amount of outstanding data in the PVOL journal volume. Completing and Running a Metrocluster Solution with Continuous Access XP 165 Data maintenance with the failure of a Metrocluster Continuous Access XP Failover The following sections, “Swap Takeover Failure (Asynchronous/Journal mode)” and “Takeover Timeout (for Continuous Access Journal mode)” describes data maintenance upon failure of a Metrocluster Continuous Access XP failover. Swap Takeover Failure (Asynchronous/Journal mode) When a device group pair state is SVOL-PAIR at a local site and is PVOL-PAIR at the remote site, the Metrocluster Continuous Access performs a swap takeover. The swap takeover would fail if there is an internal (unseen) error (for example, cache or shared memory failure) in the device group pair. In this case, if the AUTO-NONCURDATA is set to 0, the package will not be started and the SVOL state is change to SVOL-PSUE (SSWS) by the takeover command. The PVOL site either remains in PVOL-PAIR or is changed to PVOL-PSUE. The SVOL is in SVOL-PSUE(SSWS) meaning that the SVOL is read/write enabled and the data is usable but not as current as PVOL. In this case, either use FORCEFLAG to startup the package on SVOL site or fix the problem and resume the data replication with the following procedures: 1. Split the device group pair completely (pairsplit -g -S). 2. Re-create a pair from original PVOL as source (use paircreate command). 3. Startup package on either the PVOL site or SVOL site. Takeover Timeout (for Continuous Access Journal mode) A takeover timeout occurs when a package failover to the secondary site (SVOL) and Metrocluster Continuous Access issues takeover (either swap or SVOL takeover) command on SVOL. If the journal group pair is flushing the journal data from PVOL to SVOL and takeover timeout occurs, the package would not start and the following situations would occur: 1. The device group pair state remains in PVOL-PAIR/SVOL-PAIR. 2. The journal data is continuously transferring to the SVOL. In this case, it is required to wait for the completion of the journal data flushing and the state for each of the following: • Primary site: PVOL-PAIR or PVOL-PSUS(E) • Secondary site: SVOL-PSUS(SSWS) or SVOL-PSUE(SSWS) At this point, execute either: (1) by using the FORCEFLAG to startup the package on SVOL site or (2) to fix the problem (if any of Continuous Access links was failed) and resume the data replication with the following procedures: 1. Split the device group pair completely (pairsplit -g -S). 2. Re-create a pair from original PVOL as source (use the paircreate command). 3. Startup package on PVOL site (or SVOL site). 166 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP PVOL-PAIR with SVOL-PSUS(SSWS) State (for Continuous Access Journal Mode) PVOL-PAIR with SVOL-PSUS(SSWS) is an intermediate state. The following is one scenario that leads to this state: • • • At T1, device pair is in PVOL-PAIR/SVOL-PAIR and the AP value is 0 in SVOL site. At T2, a failover occurs; package failover from PVOL site to SVOL site. The Metrocluster Continuous Access issues SVOL-Takeover and the state will become SVOL-PSUS(SSWS) and PVOL-PAIR. At T3, all Continuous Access links have been recovered. The state stays in SVOL-PSUS (SSWS) and PVOL-PAIR. The duration the PVOL remains in PAIR state is relatively short The PVOL-PAIR/SVOL-PSUS (SSWS) is an invalid state for XP Asynchronous (both Continuous Access/Asynchronous and Continuous Access Journal). In this state, by issuing a pairresync or takeover command, it would fail. It is necessary to wait for the PVOL to become PSUE. XP Continuous Access Device Group Monitor In the Metrocluster/Continuous Access environment, where the device group state is not actively monitored, it may not be apparent when the application data is not remotely protected for an extended period of time. Under these circumstances, the XP/Continuous Access device group monitor provides the capability to monitor the status of the XP/Continuous Access device group used in a package. The XP/Continuous Access device group monitor, based on a pre-configured environment variable, also provides the ability to perform automatic resynchronization of the XP/Continuous Access device group upon link recovery. NOTE: If the monitor is configured to automatically resynchronize the data from PVOL to SVOL upon link recovery, a Business Copy (BC) volume of the SVOL should be configured as another mirror. In the case of a rolling disaster and the data in the SVOL becomes corrupt due to an incomplete resynchronization, the data in the BC volume can be restored to the SVOL. This will result non-current, but usable data in the BC volumes The monitor, as a package service, periodically checks the status of the XP/Continuous Access device group that is configured for the package, and sends notification to the user via email, syslog, and console if there is a change in the status of the package’s device group. XP/Continuous Access Device Group Monitor Operation Overview The XP/Continuous Access device group monitor runs as a package service. The user can configure the monitor's setting through the package's environment file. Once the Completing and Running a Metrocluster Solution with Continuous Access XP 167 package has started the XP/Continuous Access device group monitor, the monitor will periodically check the status of the XP/Continuous Access device group. If there is a change in the status or the monitor is configured to notify after an interval of no status change, the monitor will send a notification that states the reason for the notification, a timestamp, and the status of the XP/Continuous Access device group. Configuring the Monitor Use the following steps to configure a monitor for a package’s device group: • • Configure the monitor’s variables in the package environment file. Configure the monitor as a service of the package. Configure the Monitor’s Variables in the Package Environment File. Edit the following variables of the monitor’s section in the environment file _xpca.env as follows: NOTE: • • • • • • See Appendix A for an explanation of these variables. Uncomment the MON_POLL_INTERVAL variable and set it to the desired value in minutes. If this variable is not set, it will default to a value of 10 minutes. Uncomment the MON_NOTIFICATION_FREQUENCY variable and set it to the desired value. This value is used to control the frequency of notification message when the state of the device group remains the same after the first check of the device group's state. If the value is zero, the monitor will only send notification when the state of the device group has changed. If the variable is not set, the default will be 0. If you want to receive notification messages over email, uncomment the MON_NOTIFICATION_EMAIL variable and set it to a fully qualified email address. Multiple email addresses can be configured using comma as separator between the addresses. If you want notification messages to be logged in the syslog file, uncomment the MON_NOTIFICATION_SYSLOG variable and set it to 1. If you want notification messages to be logged on the system's console, uncomment the MON_NOTIFICATION_CONSOLE variable and set it to 1. If you want an automatic resynchronization upon link recovery, uncomment the AUTO_RESYNC variable and set it to either 0, 1 or 2. If AUTO_RESYNC is set to 0 (DEFAULT), the monitor will not try to do the resynchronization from PVOL to SVOL. This setting will only send notifications. If AUTO_RESYNC is set to 1, the monitor will split the remote BC if one is configured from the mirror group before trying to do the resynchronization from PVOL to SVOL. 168 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP If AUTO_RESYNC is set to 2, the monitor will only do the resynchronization from PVOL to SVOL when it finds the MON_RESYNC file in the package directory on the node that the package is running. The monitor will not manage the remote BC prior to and after the resynchronization. This setting is used if the user wants to manage the BC themselves. To enable the Continuous Access resynchronization for AUTO_RESYNC=2, it is necessary to create a file using the HP-UX command touch. For example: # touch /etc/cmcluster/packageA/MON_RESYNC (where/etc/cmcluster/packageA is the package directory) After the monitor detects the MON_RESYNC file, it is automatically removed. The following is an example of the XP/Continuous Access device group monitor definition section in the environment file (_xpca.env>) where the monitor will perform the following: • • • • • • poll every 15 minutes. send a notification on every third polling, if the state of the device group remains the same. send the notifications to [email protected] and [email protected]. log notifications to system log file, syslog. display notifications to system console. perform automatic resynchronization with BC management when detecting the device group local state change to PVOL-PSUE or PVOL-PDUB. MON_POLL_INTERVAL=15 MON_NOTIFICATION_FREQUENCY=3 [email protected],[email protected] MON_NOTIFICATION_SYSLOG=1 MON_NOTIFICATION_CONSOLE=1 AUTO_RESYNC=1 Completing and Running a Metrocluster Solution with Continuous Access XP 169 Configure XP/Continuous Access Device Group Monitor as a Service of the Package Add the monitor as a service in the package's configuration file and control script file as follows: • In the package's configuration file, add the following lines: SERVICE_NAME pkgXdevgrpmon.srv SERVICE_FAIL_FAST_ENABLED NOSERVICE_HALT_TIMEOUT 5 NOTE: The SERVICE_HALT_TIMEOUT value of 5 is a recommended value. If the value is set to lower than 5 seconds as the service halt timeout, then it may not allow enough time for the monitor to properly clean itself up. • In the package's control script file, add the following lines on the SERVICE NAMES AND COMMANDS section: SERVICE_NAME[0]=”pkgXdevgrpmon.srv” SERVICE_CMD[0]=”/usr/sbin/DRMonitorXPCADevGrp ” SERVICE_RESTART[0]=”-r 10” CAUTION: If the Continuous Access links are still down while the monitor is trying to do the resynchronization and another failure occurs that causes a remote failover to the secondary site, the SVOL’s BC volumes will remain split from its mirror group. This will only occur if the monitor is configured to perform automatic resynchronization using AUTO_RESYNC=1. Configuring the XP/Continuous Access Device Group Monitor as a Service in the Site Controller Package The Device Group Monitor must be configured as a service in the Site Controller package for Site Aware Disaster Tolerant Architecture configurations. The Metrocluster environment file is located in the Site Controller package directory and the same file path must be passed to the Device Group Monitor service. The Site Controller package can be halted in the detached mode for maintenance in the Site Controller package configuration. The Site Controller package can also fail in the cluster. In these conditions, the Device Group Monitoring service is not available for the workloads. The service resumes once the Site Controller package is restarted. For more information on the Site Controller package and the detached mode halt, see “Site Controller Package” (page 327) and “Maintaining Site Controller Package” (page 370). 170 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Troubleshooting the XP/Continuous Access Device Group Monitor The following is a guideline to help identify the cause of potential problems with the XP/Continuous Access device group monitor. • Problems with email notifications: XP/Continuous Access device group monitor uses SMTP to send out email notifications. All email notification problems are logged in the package log file. If a warning message in the package log file indicates the monitor is unable to determine the SMTP port. it is caused by not having the SMTP port defined in the /etc/services file. The monitor assumes that SMTP port is 25. If a different port number is defined, the monitor will need to be restarted in order for it to connect to the correct port. If an error message in the package control log file states that the SMTP server cannot be found is caused by not having a mail server configured on the local node, such as sendmail. A mail server needs to be configured and run in the local node for email notification. Once the mail server is running in the local node, the monitor will start sending email notifications. • Problems with Unknown Continuous Access Device Status: XP/Continuous Access device group monitor relies on the Raid Manager instance to get the Continuous Access device group state. Under the circumstances when the local Raid Manager instance fails, the monitor will not be able to determine the status of the Continuous Access device group state. The monitor will send out a notification to all configured destinations, via email, stating that the state has changed to an UNKNOWN status. Since the monitor will not try to restart the Raid Manager instance, the user is required to restart the Raid Manager instance before the monitor will be able to determine the status of the Continuous Access device group. Make sure to start Raid Manager instance with the same instance number that is defined in the package’s environment file. Completing and Running a Continental Cluster Solution with Continuous Access XP The following section describes how to configure a continental cluster solution using Continuous Access XP, which requires the Metrocluster Continuous Access product. Setting up a Primary Package on the Primary Cluster Use the procedures in this section to configure a primary package on the primary cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. Completing and Running a Continental Cluster Solution with Continuous Access XP 171 NOTE: Neither the primary cluster nor the recovery cluster may configure an XP series paired volume, PVOL or SVOL, as a cluster lock disk. A cluster lock disk must always be writable. Since it cannot be guaranteed that either half of a paired volume is always writable, neither half may be used as a cluster lock disk. A configuration with a cluster lock disk that is part of a paired volume is not a supported configuration. 1. 2. Create and test a standard Serviceguard cluster using the procedures described in the Managing Serviceguard user’s guide. Install Continentalclusters on all the cluster nodes in the primary cluster (skip this step if the software has been pre installed). NOTE: Serviceguard should already be installed on all the cluster nodes. Run swinstall(1m) to install Continentalclusters and Metrocluster Continuous Access (Continuous Access) products from an SD depot. 3. When swinstall(1m) has completed, create a directory as follows for the new package in the primary cluster. # mkdir /etc/cmcluster/ Create an Serviceguard package configuration file in the primary cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize it as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Only after primary packages start, use cmmodpkg to enable package switching on all primary packages. Enabling package switching in the package configuration would automatically start the primary package when the cluster starts. However, had there been a primary cluster disaster, resulting in the recovery package starting and running on the recovery cluster, the primary package should not be started until after first stopping the recovery package. 4. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD, and SERVICE_RESTART parameters. Set LV_UMOUNT_COUNT to 1 or greater. 172 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 5. 6. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env. # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env 7. Edit the environment file _xpca.env as follows: a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_. It is recommended that you retain the default values of these variables unless you have a specific business requirement to change them. See Appendix A for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. d. Uncomment the DEVICE_GROUP variable and set it to this package’s Raid Manager device group name, as specified in the Raid Manager configuration file. e. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used or disabled. If Raid Manager protection facility is enabled set it to the name of the HORCM permission file. f. Uncomment the HORCMINST variable and set it to the Raid Manager instance name used by Metrocluster/Continuous Access. g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array. h. If using asynchronous data replication, set the HORCTIMEOUTvariable to a value greater than the side file timeout value configured with the Service Processor (SVP), but less than the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default setting is the side file timeout value + 60 seconds. i. Uncomment the CLUSTER_TYPE variable and set it to CONTINENTAL. 8. Distribute Metrocluster/Continuous Access configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ Completing and Running a Continental Cluster Solution with Continuous Access XP 173 other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 9. Apply the Serviceguard configuration using the cmapplyconf command or SAM. 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Metrocluster/Continuous Access package control script pkgname_xpca.env Metrocluster/Continuous Access environment file pkgname.ascii Serviceguard package ASCII configuration file pkgname.sh Package monitor shell script, if applicable other files Any other scripts you use to manage Serviceguard packages. The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster/Continuous Access. 11. Edit the file /etc/rc.config.d/raidmgr, specifying the Raid Manager instance to be used for Continentalclusters, and specify the instance is to be started at boot time. The appropriate Raid Manager instance used by Continentalclusters must be running before the package is started. This normally means the Raid Manager instance must be started before starting Serviceguard. 12. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and package failover. 13. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted at this time. Setting up a Recovery Package on the Recovery Cluster Use the procedures in this section to configure a recovery package on the recovery cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. 174 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP NOTE: Neither the primary cluster nor the recovery cluster may configure an XP series paired volume, PVOL or SVOL, as a cluster lock disk. A cluster lock disk must always be writable. Since it cannot be guaranteed that either half of a paired volume is always writable, they may not be used as a cluster lock disk. Using a disk as a cluster lock disk that is part of a paired volume is not a supported configuration. 1. 2. Create and test a standard Serviceguard cluster using the procedures described in the Managing Serviceguard user’s guide. Install Continentalclusters on all the cluster nodes in the recovery cluster (skip this step if the software has been pre installed). NOTE: Serviceguard should already be installed on all the cluster nodes. Run swinstall(1m)to install Continentalclusters and Metrocluster Continuous Access products from an SD depot. The toolkit integration scripts, environment file and contributed scripts will reside in the /opt/cmcluster/toolkit/SGCA and /usr/sbin directories. 3. When swinstall(1m) has completed, create a directory as follows for the new package in the recovery cluster. # mkdir /etc/cmcluster/ Create an Serviceguard package configuration file in the recovery cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize it as appropriate to your application. Make sure to include the pathname of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Do not usecmmodpkg to enable package switching on any recovery package. Enabling package switching will automatically start the recovery package. Package switching on a recovery package will be automatically set by the cmrecovercl command on the recovery cluster when it successfully starts the recovery package. 4. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard. standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. Completing and Running a Continental Cluster Solution with Continuous Access XP 175 NOTE: Some of the control script variables, such as VG and LV, on the recovery cluster must be the same as on the primary cluster. Some of the control script variables, such as, FS, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART are probably the same as on the primary cluster. Some of the control script variables, such as IP and SUBNET, on the recovery cluster are probably different from those on the primary cluster. Make sure that you review all the variables accordingly. 5. 6. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env. # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env 7. 176 Edit the environment file _xpca.env as follows: a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_. It is recommended that you retain the default values of these variables unless you have a specific business requirement to change them. See Appendix A for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. d. Uncomment the DEVICE_GROUP variable and set it to this package’s Raid Manager device group name, as specified in the Raid Manager configuration file. e. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used or disabled. If Raid Manager protection facility is enabled set it to the name of the HORCM permission file. f. Uncomment the HORCMINST variable and set it to the Raid Manager instance name used by Metrocluster/Continuous Access. g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP h. If you are using asynchronous data replication, set the HORCTIMEOUTvariable to a value greater than the side file timeout value configured with the Service Processor (SVP), but less than the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default setting is the side file timeout value + 60 seconds. i. Uncomment the CLUSTER_TYPE variable and set it to CONTINENTAL. 8. Distribute Continentalclusters/Continuous Access configuration, environment and control script files to other nodes in the cluster by using ftp or rcp. # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 9. Apply the Serviceguard configuration using the cmapplyconf command or SAM. 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: bkpbkgname.cntl Metrocluster/Continuous Access package control script bkpkgname_xpca.env Metrocluster/Continuous Access environment file bkpkgname.ascii Serviceguard package ASCII configuration file bkpkgname.sh Package monitor shell script, if applicable other files Any other scripts you use to manage Serviceguard packages 11. Edit the file /etc/rc.config.d/raidmgr, specifying the Raid Manager instance to be used for Continentalclusters, and specify that the instance be started at boot time. NOTE: The appropriate Raid Manager instance used by Continentalclusters must be running before the package is started. This normally means that the Raid Manager instance must be started before Serviceguard is started. 12. Make sure the packages on the primary cluster are not running. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg) test the recovery cluster for cluster and package startup and package failover. 13. Any running package on the recovery cluster that has a counterpart on the primary cluster should be halted at this time. Completing and Running a Continental Cluster Solution with Continuous Access XP 177 Setting up the Continental Cluster Configuration The steps below are the basic procedure for setting up the Continentalclusters configuration file and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2: “Designing a Continental Cluster”. 1. Generate the Continentalclusters configuration. # cmqueryconcl -C cmconcl.config 2. Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. When data replication is done using Continuous Access XP, there are no data sender and receiver packages. Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step. 3. 4. 5. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/ scripts to/etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The desired result is to have the monitor package start automatically when the cluster is formed. Apply the monitor package to both cluster configurations. # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config 6. Apply the continental cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/ cmclconfig nor is there an equivalent file for Continentalclusters. # cmapplyconcl -C cmconcl.config 7. Start the monitor package on both clusters. NOTE: The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status. 8. 178 Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 9. Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster. 10. View the status of the continental cluster primary and recovery clusters, including configured event data. # cmviewconcl -v The continental cluster is ready for testing. (See “Testing the Continental Cluster” (page 91)) Switching to the Recovery Cluster in Case of Disaster It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following: • • • During an alert, the cmrecovercl will not start the recovery packages unless the -f option is used. During an alarm, the cmrecovercl will start the recovery packages without the -f option. When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up. Failback Scenarios The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following discussion addresses some of those failures and suggests recovery approaches applicable to environments using data replication provided by HP StorageWorks XP series disk arrays and Continuous Access. In Chapter 2: “Designing a Continental Cluster”, there is a discussion of failback mechanisms and methodologies in “Restoring Disaster Tolerance” (page 99). Scenario 1 The primary site has lost power, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the XP disk array or the operating systems of the systems at the primary site. Scenario 2 The primary site XP disk array experienced a catastrophic hardware failure and all data was lost on the array. Completing and Running a Continental Cluster Solution with Continuous Access XP 179 Failback in Scenarios 1 and 2 After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. Each Continentalclusters package control script that invokes Metrocluster Continuous Access XP will evaluate the status of the XP paired volumes. Since neither the systems nor the XP disk array at the primary site are accessible, the control file will initially report the paired volumes with a local status of SVOL_PAIR or SVOL_PSUE (in ASYNC mode) and a remote status of EX_ENORMT, PSUE or PSUS, indicating that there is an error accessing the primary site. The control file script is programmed to handle this condition and will enable the volume groups, mount the logical volumes, assign floating IP addresses and start any processes as coded into the script. NOTE: In ASYNC mode, the package will halt unless a force flag is present or unless the auto variable AUTO_SVOLPSUE is set to 1. The fence level of the paired volume—NEVER, ASYNC, or DATA—will not impact the starting of the packages at the recovery site. The Metrocluster CAXP pre-integrated solution will perform the following command with regards to the paired volume. # horctakeover -g -S Subsequently, the paired volume will have a status of SVOL_SWSS. To view the local status of the paired volumes. # pairvolchk -g -s To view the remote status of the paired volumes. # pairvolchk -g -c (While the remote XP disk array and primary cluster systems are down, the command will time out with an error code of 242.) After power is restored to the primary site, or when a newly configured array is brought online, the XP paired volumes may have either a status of PVOL_PSUE on the primary site or SVOL_SWSS on the secondary site. The following procedure applies to this situation: 1. While the package is still running, from the recovery host. # pairresync -g -c 15 -swaps This starts the resynchronization, which can take a long time if the entire primary disk array was lost or a short time if the primary array was intact at the time of failover. 2. When resynchronization is complete, halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg 180 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the paired volumes will remain SVOL_PAIR at the recovery site and PVOL_PAIR at the primary site. 3. 4. Start the cluster at the primary site. Assuming they have been properly configured, the Continentalclusters primary packages should not start. The monitor package should start automatically. Manually start the Continentalclusters primary packages at the primary site. # cmrunpkg 5. Ensure that the monitor packages at the primary and recovery sites are running. Failback when the Primary has SMPL Status Use the following procedure when the primary site paired volumes have a status set to SMPL, possibly through manual intervention: 1. Halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the paired volumes will remain SMPL at the recovery site and PSUE at the primary site. 2. 3. Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically. Since the paired volumes have a status of SMPL at both the primary and recovery sites, the XP views the two halves as unmirrored. From a system at the primary site, manually create the paired volume. # paircreate -g -f -vr -c 15 See the XP Raid Manager user’s guide on more paircreate command options. Since the most current data will be at the remote or recovery site, this will synchronize the data from the remote or recovery site (use of the -vr option directs the command to synchronize from the remote site). Wait for the synchronization process to complete before proceeding to the next step. Failure to wait for the synchronization to complete will result in the package failing to start in the next step. 4. Manually start the Continentalclusters primary packages at the primary site. # cmrunpkg Completing and Running a Continental Cluster Solution with Continuous Access XP 181 The control script is programmed to handle this case. The control script recognizes that the paired volume is synchronized and will proceed with the programmed package startup. 5. Ensure that monitor packages are running at both sites. Maintaining the Continuous Access XP Data Replication Environment Resynchronizing After certain failures, data are no longer remotely protected. In order to restore disaster-tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored. Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection: • failure of ALL Continuous Access links without restart of the application • failure of ALL Continuous Access links with Fence Level DATA with restart of the application on a primary host • failure of the entire recovery Data Center for a given application package • failure of the recovery XP disk array for a given application package while the application is running on a primary host Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated by moving the application package back to its primary host after repairing the failure. • • • • failure of the entire primary Data Center for a given application package failure of all of the primary hosts for a given application package failure of the primary XP disk array for a given application package failure of all Continuous Access links with application restart on a secondary host NOTE: The preceding steps are automated provided the default value of 1 is being used for the auto variable AUTO_PSUEPSUS. Once the Continuous Access link failure has been fixed, the user only needs to halt the package at the failover site and restart on the primary site. However, if you want to reduce the amount of application downtime, you should manually invoke pairresync before failback. Full resynchronization must be manually initiated (as described in the next section) after repairing the following failures: • • failure of the recovery XP disk array for a given application package followed by application startup on a primary host failure of all Continuous Access links with Fence Level NEVER or ASYNC with restart of the application on a primary host Pairs must be manually recreated if both the primary and recovery XP disk arrays are in the SMPL (simplex) state. 182 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Make sure you periodically review the following files for messages, warnings and recommended actions. It is recommended to review these files after system, data center and/or application failures: • • • /var/adm/syslog/syslog.log /etc/cmcluster//.log /etc/cmcluster/.log Using the pairresync Command The pairresync command can be used with special options after a failover in which the recovery site has started the application and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the Continuous Access link is fixed, depending on which site you are on, use the pairresync command in one of the following two ways: • pairresync -swapp—from the primary site. • pairresync -swaps—from the failover site. These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL, which is now running on the primary site. During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site. Some Further Points • This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the XP disk array or to synchronize. NOTE: Long delays in package startup time will occur in those situations when recovering from broken pair affinity. • • The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time due to getting status from the XP disk array. (See the previous paragraph for more information on the extra startup time). Online cluster configuration changes may require a Raid Manager configuration file to be changed. Whenever the configuration file is changed, the Raid Manager instance must be stopped and restarted. The Raid Manager instance must be running before any Continentalclusters package movement occurs. Completing and Running a Continental Cluster Solution with Continuous Access XP 183 • • • 184 A given file system must not reside on more than one XP frame for either the PVOL or the SVOL. A given LVM Logical Volume (LV) must not reside on more than one XP frame for either the PVOL or the SVOL. The application is responsible for data integrity, and must use the O_SYNC flag when ordering of I/Os is important. Most relational database products are examples of applications that ensure data integrity by using the O_SYNC flag. Each host must be connected to only the XP disk array that contains either the PVOL or the SVOL. A given host must not be connected to both the PVOL and the SVOL of a continuous access pair. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 4 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA The HP StorageWorks Enterprise Virtual Array (EVA) allows you to configure data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the Continuous Access EVA software and the additional files that integrate the EVA with Serviceguard clusters. It then shows how to configure metropolitan cluster solutions using Continuous Access EVA. The topics discussed in this chapter are: • • • • • Files for Integrating the EVA with Serviceguard Clusters Overview of EVA and Continuous Access EVA Concepts Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323). Files for Integrating the EVA with Serviceguard Clusters Metrocluster consists of a script, program files, and an environment file that work in an Serviceguard metropolitan cluster to automate failover to alternate nodes in the case of a disaster. The Metrocluster Continuous Access EVA product contains the following files. Table 4-1 Metrocluster Continuous Access EVA Template Files Name Description /usr/sbin/DRCheckDiskStatus The script that checks for a specific environment file in the package directory and executes the specific storage DR check program. This file should not be edited. /usr/sbin/DRCheckCA EVADevGrp The program that manages the Continuous Access EVA DR group that is used by the package. /usr/sbin/smispasswd The utility that is used to define the information about Management Server and SMI-S that are used in the solution. /usr/sbin/evadiscovery The utility that is used to define the information about EVA storage and DR groups that are used in the solution. Files for Integrating the EVA with Serviceguard Clusters 185 Table 4-1 Metrocluster Continuous Access EVA Template Files (continued) Name Description /opt/cmcluster/toolkit/SGCA EVA/smiseva.conf The Metrocluster Continuous Access EVA Management Server and SMI-S configuration template. This file must be edited for the specific Management Server and SMI-S information before using it. /opt/cmcluster/toolkit/SGCA EVA/mceva.conf The Continuous Access EVA configuration template. This file must be edited for the specific EVA storage cells and DR Group information to be used in a Metrocluster environment before using it. /opt/cmcluster/toolkit/SGCA EVA/caeva.env The Metrocluster Continuous Access EVA environment file. This file must be customized for specific EVA DR groups and Serviceguard packages. Copies of this file must be customized for each separate Serviceguard package. /opt/cmcluster/toolkit/SGCA EVA/Samples A directory containing sample convenience shell scripts that must be edited before using. These shell scripts may help to automate some configuration tasks. These scripts are contributed, and not supported. Metrocluster Continuous Access EVA software has to be installed on all nodes that will run a Serviceguard package whose data is on an HP StorageWorks EVA and where the data is replicated to a second EVA using the Continuous Access EVA facility. In the event of a node failure, the integration of Metrocluster Continuous Access EVA with the package will allow the application to fail over in the following ways: • • Among local host systems attached to the same EVA. Between one system that is attached locally to its EVA and another “remote” host that is attached locally to the other EVA. Configuration of Metrocluster Continuous Access EVA must be done on all the cluster nodes, as is done for any other Serviceguard package. To use Metrocluster Continuous Access EVA, Command View EVA and SMI-S EVA must also be installed and configured on the Management Server. Overview of EVA and Continuous Access EVA Concepts Continuous Access EVA provides remote data replication from primary EVA systems to remote EVA systems. Continuous Access EVA uses the remote-copy function of the Hierarchical Storage Virtualization (HSV) controller running the controller software (VCS or XCS) to achieve host-independent data replication. This section describes some basic Continuous Access EVA terminology, concepts, and features. The topics discussed are: • • • 186 Data Replication Copy Sets DR Groups Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA • • • Log Disk Managed Sets Failover Metrocluster with EVA and Data Replication The HSV controller pairs at the primary location are connected to their partner HSV controller pairs at the alternate location. To configure storage for data replication, a source Vdisk is specified in the primary storage system. The destination Vdisk is then created by the controller software at the remote storage system. As data is written to the source Vdisk, it is mirrored to the destination Vdisk. Applications continue to run while data replication goes on in the background over a separate interconnect. When a storage system contains both source Vdisks and destination Vdisks, it is said to be bidirectional. A given storage system can have a bi-directional data replication relationship with only one other storage system, and an individual Vdisk can have a uni-directional replicating relationship with only one other Vdisk. The remote copy feature is intended not only for disaster recovery, but also to replicate data from one storage system or physical site to another storage system or site. It also provides a method for performing a backup at either the source or destination site. DR Groups A data replication (DR) group is a software construct comprising one or more Vdisks in an HSV storage system so that they: • Replicate to the same specified destination storage array • Fail over together • Preserve write order within the data replication collection groups • Share a log disk All virtual disks used for replication must belong to a DR group, and a DR group must contain at least one Vdisk. A DR group can be thought of as a collection of copy sets. The replicating direction of a DR group is always from a source to a destination. By default, the storage system on which the source Vdisk is created is called the home storage system. The home designation denotes the preferred storage system for the source and this designation can be changed to another storage system. A DR group contains pointers to another DR group for replication. A DR group replicating from a home storage system to a destination system is in the original state. When replication occurs from a storage system that was created as the destination to the home storage system (for example, after a failover, which is discussed later), it is in a reversed state. Overview of EVA and Continuous Access EVA Concepts 187 DR Group Properties Properties are defined for every DR group that is created. DR group properties are described below: • • • • • • Name: A unique name given to each DR group. HP recommends that the names of replicating DR groups at the source and destination be the same. DR Mode — Source: A DR group established as an active source that replicates to a passive destination. — Destination: A DR group established as a passive destination that receives replication data from an active source. Failsafe mode: When this mode is enabled, all source Vdisks become both unreadable and unwritable if the destination Vdisk is unreachable. This condition is known as failsafe-locked and may require immediate intervention. When the failsafe mode is disabled and the destination Vdisk is unreachable, normal logging occurs. Connected system: A pointer to the storage system where the DR group is replicated. Write mode: — Asynchronous mode: When a write operation provides an I/O completion acknowledgement to the host after data is delivered to cache at the source controller, but before data delivery to cache on the destination controller. — Synchronous mode: An I/O completion acknowledgement is sent to the host after data is written to the source and destination caches. Suspension: — Suspend: When this command is enabled and failsafe mode is not enabled, I/O replication is halted between the source and destination Vdisks. Source Vdisks continue to run I/O locally and the I/O is also copied to a log Vdisk. — Resume: When this command is enabled, replication resumes between the source and destination Vdisks. Merging of the log Vdisk or a full copy is also performed. Log Disk The DR group has storage allocated on demand called a log. The virtual log collects host write commands and data if access to the destination storage system is severed. When a connection is later re-established, the contents of the log are written to the destination Vdisk to synchronize it with the source Vdisk. This process of writing the log contents, in the order that the writes occurred, is called merging. Sometimes it is more practical to copy the source Vdisk directly to the destination Vdisk. This copy operation is called a “full copy-all 1-MB”. There is no manual method for forcing a full copy. It is an automatic process that occurs when a log is full. If synchronous replication is configured, a log can be in one of the following states: 188 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA • • • Normal: No source Vdisk is logging or merging. Logging: At least one source Vdisk is logging (capturing host write commands), but none are merging. Merging: At least one source Vdisk is merging and logging. When a DR group is in a logging state, the log will grow in proportion to the amount of write I/O being sent to the source Vdisks. As the log grows, more space must be allocated out of the available capacity of the disk group where it is a member. The capacity available to the log does not include the spare capacity or any capacity being used by Vdisks, snapshots, or Snapclones. This means the log disk will never overwrite any other data. Similarly, when a DR group is logging, the available capacity for creating Vdisks, snapshots, and Snapclones does not include the capacity already used by the log disk. Therefore, a log disk will never be overwritten by any other data. When creating disk groups and distributing Vdisks within them, sufficient capacity must remain for log disks to expand to their maximum level. The log is declared full, and reaches its maximum level, whenever the first of the following conditions is reached: • The size of the log data file exceeds twice the capacity of the DR group. • No free space remains in the physical disk group. • The log reaches 2 TB of Vraid1 (4 TB total). Copy Sets Vdisks are user-defined storage allotments of virtual or logical data storage. A pairing relationship can be created to automatically replicate a logical disk to another logical disk. The generic term for this is a copy set. A relationship refers to the arrangement created when two storage systems are partnered for the purpose of replicating data between them. A Vdisk does not have to be part of a copy set. Vdisks at any site can be set up for local storage and used for activities such as testing and backup. Clones and snapclones are examples of Vdisks used in this manner. When a Vdisk is not part of a copy set, it is not disaster tolerant, but it can use various Vraid types for failure tolerance. Managed Sets A managed set is a collection of DR groups selected for the purpose of managing them. For example, a managed set can be created to manage all DR groups of a particular application that reside in separate storage arrays. Failover The recovery process whereby one DR group, managed set, fabric, or controller switches over to its backup is called a failover. The process can be planned or unplanned. A planned failover allows an orderly shutdown of the system before the redundant system takes over. An unplanned failover occurs when a failure or outage occurs that may not Overview of EVA and Continuous Access EVA Concepts 189 allow an orderly transition of roles.Listed below are several types of Continuous Access EVA failovers: • • • • DR group failover: An operation to reverse the replication direction of a DR group. A DR group can have a relationship with only one other DR group, and a storage system can have a relationship with only one other storage system. Managed set failover: An operation to reverse the replication direction of all DR groups in the managed set. Fabric or path failover: The act of transferring I/O operations from one fabric or path to another. Controller failover: When a controller assumes the workload of its partner (within the same storage system). Continuous Access EVA Management Software Metrocluster Continuous Access EVA requires the following two software components to be installed in the Management Server: • • HP StorageWorks Command View EVA (CV EVA). This software component allows you to configure and manage the storage and DR group via a web browser interface. Storage Management Interface Specification (SMI-S). The SMI-S EVA software provides the Storage Management Interface Specification (SMI-S) interface for the management of EVA arrays. Metrocluster Continuous Access EVA software uses WBEM API to communicate with SMI-S to automatically manage the DR Groups that are used in the application packages. Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA When the following procedures are completed, an adoptive node will be able to access the data belonging to a package after it fails over. Setting up the Storage Hardware 1. 2. 3. 4. 190 Before configuring Metrocluster Continuous Access EVA, the EVA must be correctly cabled with redundant paths to each node in the cluster that will run packages accessing data on the array. Install and configure the hardware components of the EVA, including HSV controllers, disk arrays, SAN switches, and Management Server. Install and configure CV EVA and SMI-S EVA on the Management Server. For the installation and configuration process, refer to the HP StorageWorks Command View EVA Installation Guide. Start CV EVA User Interface (CV EVA-UI). You can configure virtual disks and DR groups using the CV EVA web user interface shown in Figure 4-1. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Figure 4-1 Configuration of Virtual Disks and DR groups For more detailed information on setting up Command View EVA for configuring, managing and monitoring your HP StorageWorks Enterprise Virtual Array Storage System, refer to the HP StorageWorks Command View EVA User Guide. After a DR group is created, only the source volume (primary volume) is visible and accessible with Read/Write mode. The destination volume (secondary volume) by default is not visible and accessible to its local hosts. The destination volume access mode needs to be changed to Read-only mode before the DR group can be used. The destination volumes need to be presented to its local host. NOTE: In the Metrocluster Continuous Access EVA environment, it is required that the destination volume access mode be set to read-only mode. The destination Vdisk read-only mode can be changed by using the SSSU command for HP-UX. When executing the SSSU command, it needs to be executed against the storage cell that holds the source Vdisk of the DR group. For users who are not familiar with the SSSU command, an input sample file is provided below and in the following location: /opt/cmcluster/toolkit/SGCAEVA/ Samples/sssu_sample_input. select manager 15.13.244.182 user=administrator pass=administrator select system DC-1 set DR_GROUP “\Data Replication\DRG_DB1” accessmode=readonly show DR_GROUP “\Data Replication\DRG_DB1” NOTE: For more detailed information on the sssu commands used in the sample input file, refer to the sssu ReadMe file found at /opt/cmcluster/toolkit/ SGCAEVA/Samples/Readme.sssu_sample_\ input Follow the steps below when copying and editing the sample file: Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 191 1. Copy the sample file /opt/cmcluster/toolkit/SGCAEVA/Samples/ sssu_sample_input to the /etc/dtsconf/directory. # cp /opt/cmcluster/toolkit/SGCAEVA \/Samples/sssu_sample_input /etc/dtsconf/sssu_input 2. 3. Customize the file sssu_input. After you customize the sssu_input file, run the SSSU command as follows to set the destination Vdisk to read-only mode # /sbin/sssu “FILE ” 4. After changing the access mode of the destination Vdisk, it is necessary to run the ioscan command and the insf command on remote clustered nodes to create the special device file name for the destination Vdisk on remote EVA. Cluster Configuration For detailed information on Serviceguard cluster configuration, refer to the Managing Serviceguard user’s guide. The following information pertains to cluster configuration in a EVA Continuous Access environment. First create a Serviceguard cluster without specifying cluster-aware volume groups in the cluster configuration ASCII file. This is necessary because the LUNs in the EVA storage units are not read/write to all cluster nodes at configuration time. Only the LUNs configured as source volumes are read/write on one cluster site. The remote site can see those LUNs with read-only mode and therefore, the cmapplyconf command cannot succeed if volume groups are specified in the file. Volume groups are created and made cluster aware in separate steps, shown in the “Configuring Volume Groups” (page 200) of this chapter. NOTE: If your ASCII file contains volume group definitions derived from the LUNs visible on the source node, comment them out before running the cmapplyconfcommand. Management Server/SMI-S and DR Groups Configuration The Metrocluster Continuous Access EVA product provides two utility tools for users to provide Metrocluster Continuous Access EVA software the information about the SMI-S EVA service running on the Management Servers and DR groups that will be used in Metrocluster Continuous Access EVA environment. This section discusses the smispasswd and evadiscovery tools, including the description of the tools, the tool operations, and the input file templates.The first utility, called smispasswd, is a Command Line Interface (CLI) that provides functions for defining Management Server list and SMI-S username and password pair. The second utility, called evadiscovery,is also a CLI that provides functions for defining EVA storage cells and DR group information. 192 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA When Metrocluster Continuous Access EVA program requests a storage state, it sends a request message to a local Management Server. For preparing the message, several data items need to be available so that the Metrocluster Continuous Access EVA program knows which Management Server it will communicate with. These data items include Management Server's hostname/IP address, and SMI-S username/password. Before configuring and bringing up any Metrocluster package, this is the first information that needs to be configured. Metrocluster software communicates with the SMI-S service running on the Management Server, which communicates with the EVA controller. When querying EVA storage states through the SMI-S, the code first needs to find the internal device IDs by querying and searching for a list of devices information. These processes take time and are not necessary since the IDs are static in the EVA system. To improve the query performance, the software will cache these IDs in the clustered nodes. To cache the object IDs in the clustered nodes, it is required to run the evadiscovery tool after the EVA and Continuous Access EVA are configured, and the storage is accessible from the hosts. The tool will query the active Management Server for the needed information and save it in a mapping file. It is necessary to distribute the mapping file to all the clustered nodes. Defining Management Server and SMI-S Information To define Management Server and SMI-S information use the smispasswd tool. The following steps describe the options for defining Management Server and SMI-S information: Creating the Management Server List On a host that resides on the same data center as the active management server, create the Management Server list using an input file, use the following steps: 1. 2. Create a configuration input file (A template of this file can be found in /opt/ cmcluster/toolkit/SGCAEVA/smiseva.conf). Copy the template file /opt/cmcluster/toolkit/SGCAEVA/smiseva.conf to the /etc/dtsconf/ directory. # cp /opt/cmcluster/toolkit/SGCAEVA/smiseva.conf \/etc/dtsconf/smiseva.conf 3. For each Management Server in your configuration (both local and remote sites), enter the Management Server’s hostname or IP address, the administrator login name, type of connection (secure or non-secure), and SMI-S name space. An example of the smiseva.conf file is as follows: ############################################################## ## ## ## smiseva.conf CONFIGURATION FILE (template)for use with ## ## the smispasswd utility in the Metrocluster Continuous ## ## Access EVA Environment. ## Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 193 ## Note: This file MUST be edited before it can be used. ## ## For complete details about Management Server/SMI-S ## ## configuration for use with Metrocluster Continuous ## ## Access EVA, consult “Designing Disaster Tolerant High ## ## Availability Clusters. ## ############################################################## ## This file provides input to the smispasswd utility, ## ## which you use to set up secure access paths between ## ## cluster nodes and SMI-S services. ## ## Edit this file to include the appropriate information ## ## about the SMI-S services that will be used in your ## ## Metrocluster Continuous Access EVA environment. ## ## After entering all the desired information, run the ## ## smispasswd command to generate the security ## ## configuration that allows cluster nodes to communicate ## ## with the SMI-S services. ## ## Below is an example configuration. The data is ## ## commented out. ## ## Hostname/IP_Address User_login_name Secure Namespace ## ## IP_Address Connection ## ## 15.13.244.182 administrator y root/EVA ## ## 15.13.244.183 administrator y root/EVA ## ## 15.13.244.192 admin12309 y root/EVA ## ## SANMA04 admin y root/EVA ############################################################ ## The example shows a list of 4 Management Server/SMI-S ## ## data in the Metrocluster Continuous Access EVA ## ## environment. Each line represents a different SMI-S’s ## ## data; fields on each line should be separated either by ## ## space(s)or tab(s). The order of fields is significant. ## ## The first field must be a hostname or IP address, the ## ## second field must be a user login name on the host. The ## ## third field must be ‘y’ or ‘n’ to use SSL connect. The ## ## last field must be the namespace of the SMI-S service. ## ## For details of each field data, refer to the smispasswd ## ## man page, ‘man smispasswd’. ## ############################################################# ## Note: Lines beginning with the pound sign (#) are ## ## comments. You # cannot use the ‘#’ character in your ## ## data entries. Enter your SMI-S services data under the ## ## dashed lines: ## ## Hostname/IP_Address User_login_name Secure Namespace ## ## IP_Address Connection ## 15.13.172.11 administrator n root/EVA ## ## 15.13.172.12 administrator n root/EVA ## ############################################################### ## ## Fill in the Management Server information for each Management Server in your cluster configuration. 194 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA NOTE: list. Ensure that you place the active Management Server information first on the Creating the Management Server Mapping File Use the smispasswdcommand to create or modify the Management Server information stored in the mapping file. For each Management Server listed in the file, there will be a password prompt displayed. A username and password are required because of the security protocol for EVA and is created by your system administrator when the Management Server is configured. Input the password associated with the username of the SMI-S. Then, re-enter it (as prompted) to verify that it is correct. Example: # smispasswd -f /etc/dtsconf/smiseva.conf NOTE: For more information on configuring the username and password for SMI-S on the management server, refer HP StorageWorks Command View EVA Installation Guide. Enter password of 15.13.172.11: ********** Re-enter password of 15.13.172.11: ********** Enter password of 15.13.172.12: ********** Re-enter password of 15.13.172.12: ********** All the Management Server information has been successfully generated. When all the passwords have been entered, the configuration is written to the map file /etc/dtsconf/caeva.map. Setting a Default Management Server Use the smispasswd command to set the active Management Server that is to be used by EVA discovery tool, which will be discussed later in the section. Example: # smispasswd -d 15.13.172.12 The Management Server 15.13.172.11 has been set as the default active SMI-S. Displaying the List of Management Servers Use the smispasswdcommand to display the current list of storage management servers that are accessible by the cluster software. Example: # smispasswd -l MC/CAEVA Server list: HOST USERNAME USE_SSL ------------------------------------------------------15.13.172.11 administrator N 15.13.172.12 administrator N NAMESPACE root/EVA root/EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 195 Adding or Updating Management Server Information To add or update individual Management Server information, use the following command options shown in Table 4-2: smispasswd -h -n -u -s Table 4-2 Individual Management Server Information Command Options Description -h This is either a DNS resolvable hostname or IP address of the Management Server -n This is the name space configured for the SMI-S CIMOM1. The default namespace is root/EVA. -u This is the user name used to connect to SMI-S. The user name and password is the same as those used with the sssu tool. s This option specifies the type of connection needed to be established between Metrocluster software and the SMI-S CIMOM. “y” This option allows secure connection to Management Server using the HTTPS protocol (HTTP using Secure Socket Layer encryption). “n” This option means a secure connection is not required. 1 CIMOM - Common Information Model Object Manager, a key component that routes information between providers and clients. When you issue the command with these options, the “Enter password:” will prompt you to input the password associated with the username. After inputting a password and issuing the command, the “Re-enter password:” request will prompt you to re-enter the same password again for verification. Subsequently, this command will either add new or update the existing Management Server information to the map file. In addition, this command will add a new record if it does not find the in the mapping file. Otherwise it only updates the record. Examples: % smispasswd -h 15.13.244.202 -u administrator -n root/EVA -s y Enter password: ********** Re-enter password: ******** A new information has been successfully created % % smispasswd -h 15.13.244.203 -u administrator -n root/EVA -s n 196 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Enter password: ********** Re-enter password: ******** A new information has been successfully created % % smispasswd -h 15.13.244.202 -s n Enter password: ********** Re-enter password: ******** The information has been successfully updated % Deleting a Management Server To delete a Management Server from the group used by the cluster, use the smispasswd command with the -r option. Example: # smispasswd -r 15.13.172.12 The Management Server 15.13.172.11 has been successfully removed from the file Defining EVA Storage Cells and DR Groups On the same node, which the management server list was created, define the EVA storage cells and DR Groups information to be used in the Metrocluster Continuous Access EVA environment, and use the evadiscovery tool with the following steps: 1. Create a configuration input file. This file will contain the names of storage pairs and DR groups. (A template of this file can be found in /opt/cmcluster/ toolkit/SGCAEVA/mceva.conf) 2. Copy the template file /opt/cmcluster/toolkit/SGCA EVA/ mceva.conf)to the /etc/dtsconf directory: # cp /opt/cmcluster/toolkit/SGCAEVA/mceva.conf \ /etc/dtsconf/mceva.conf 3. 4. 5. For each pair of storage units, enter the WorldWideName (WWN) of the first and second storage units. The WWN can be found on the front of the panel of the EVA controller or from the Command View EVA user interface. For each pair of storage units, enter the names of all DR groups that are managed by that storage pair. Save the file. The following is an example of the mceva.conf file. Fill in the file as in the following example: ############################################################## ## mceva.conf CONFIGURATION FILE (template) for use with ## ## the evadiscovery utility in the Metrocluster Continuous ## Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 197 ## Access EVA Environment. ## ## Version: A.01.00 ## ## Note: This file MUST be edited before it can be used. ## ## For complete details about EVA configuration for use ## ## with Metrocluster Continuous Access EVA, consult the ## ## manual “Designing Disaster Tolerant High Availability ## ## Clusters”. ## ############################################################## ## This file provides input to the evadiscovery utility, ## ## which you use to generate the /etc/dtsconf/caeva.map ## ## file. During Metrocluster Continuous Access EVA ## ## configuration, this file is copied to all cluster nodes. ## ## Edit the file to include the appropriate data about the ## ## EVA storage systems and DR groups that will be used in ## ## your Metrocluster Continuous Access EVA environment. ## ## After entering all the desired information, run the ## ## evadiscoverycommand to generate the mapping data and save## ## it in a map file. ## ## Note: Before running evadiscovery, you need to use the ## ## smispasswd command to create a SMI-S services ## ## configuration. ## ## Enter the data for storage device pairs and DR groups ## ## after the and tags. ## ## The tag represents the starting ## ## definition of a storage pair and its DR groups. Under a ## ## tag, you must provide two storage ## ## Node World Wide Name (WWN)which both contain the DR groups## ## defined under the tag. You can define as ## ## many DR groups as you need, but each DR group must belong ## ## to only one of the storage pairs. A storage pair can have ## ## a maximum of 64 DR groups. ## ## Note that you can find storage Node World Wide Names form ## ## the front panel of your EVA controllers or from the ## ## ‘Initialized Storage Properties’ page of command view ## ## EVA through your Web browser. ## ## Below is an example of a configuration with two storage ## ## pairs (4 storage units). The first storage pair contains ## ## 2 DR groups and the second pair contains 1 DR group. ## ## ## ## “5000-1FE1-5000-4280” Enter first storage WWN in double ## ## quotes. ## ## “5000-1FE1-5000-4180” Enter second storage WWN in double ## ## quotes. ## ## ## ## “DR Group - Package1” Enter a DR group name in double ## ## quotes. ## ## “DR Group - OracleDB1” Enter a DR group name in double ## ## quotes. ## ## ## ## “5000-1FE1-5000-4081” Enter first storage WWN in double ## ## quotes. ## ## “5000-1FE1-5000-4084” Enter second storage WWN in double ## 198 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## quotes. "DR Group - Package2” Enter a DR group name in double quotes. Note:Since '#’ meant a start of a comment, you cannot include the ‘#’ in any , , storage name and DR group name. ## ## ## ## ## ## ## ## Note: All the storage and DR Group names should be ## enclosed in double quotes (““), otherwise the ## evadiscovery command will not detect them. ## Enter your MC EVA Storage pairs and DR Groups under the ## # dashed lines: ## ----------------------------------------------------------## ## “5000-1FE1-5000-00DF” ## “5000-1FE1-5000-00DE” ## ## “DR Group 1” ## “DR Group 2” ## “DR Group 3” ## “DR Group 4” ## Creating the Storage Map File After completing the EVA Storage Cells and DR Groups configuration file, use the EVA discovery utility to create or modify the storage map file stored on the configuration node. # evadiscovery -f /etc/dtsconf/mceva.conf % Verifying the storage systems and DR Groups ……… Generating the mapping data ………… Adding the mapping data to the file /etc/dtsconf/caeva.map ……… The mapping data is successfully generated. The command generates the mapping data and stores it in /etc/dtsconf/caeva.map The mapping file/etc/dtsconf/caeva.mapcontains information of the Management Servers as well as information of the EVA Storage Cells and DR Groups. Copying the Storage Map File After running the smispasswd and evadiscovery commands to generate the /etc/ dtsconf/caeva.map file, copy this file to all cluster nodes so that they can be used by Metrocluster Continuous Access EVA to communicate with the EVA units. Be sure to use the same full pathname. Displaying Information about Storage Devices Use the evadiscovery command to display information about the storage systems and DR groups in your configuration. Example: Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 199 # evadiscovery -l % MC EVA Storage Systems and DR Groups map list: Storage WWN: 5000-1FE1-5000-4280 DR Group Name: DR Group - PkgA DR Group Name: DR Group - PkgB Storage WWN: 5000-1FE5-5000-4288 DR Group Name: DR Group - PkgA DR Group Name: DR Group - PkgB NOTE: Before running the evadiscovery command, the management server configuration must be completed using the smispasswd. Otherwise, the evadiscovery command, will fail. NOTE: Run the discovery tool after all storage DR Groups are configured or when there is any change to the storage device. For example, the user removes and recreates a DR group that is used by an application package. In this case the DR Group's internal IDs are regenerated by the EVA system. Update the external configuration file if any name of storage systems or DR groups is changed, run the evadiscovery utility, and redistribute the map file /etc/dtsconf/caeva.mapto all Metrocluster clustered nodes. Verifying the EVA Configuration Use the following checklist to verify the configuration. Figure 4-2 EVA Configuration Checklist Redundant Management Servers configured and accessible to all nodes. Source and Destination volumes created for use with all packages. Management Servers Security configuration is complete (smispasswd command). EVA mapping is complete (evadiscovery command). /etc/dtsconf/caeva.map file is copied to all cluster nodes. Configuring Volume Groups This section describes the required steps to create a volume group for use in a Metrocluster Continuous Access EVA environment. Identifying Special Device File Name for Vdisk in DR Group using Secure Path V3.0D or V3.0E For each Vdisk in a DR group use CV EVA to retrieve its own unique World Wide Name (WWN) identifier. To identify the special device file name for the matching WWN identifier in a single clustered node use: # spmgr display 200 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Below is a sample output after running the spmgr command: TGT/LUN 0/ 3 Device c12t0d3 WWLUN_ID H/W_Path 6000-1FE1-0016-6C30-0009-2030-2549-000A 255/0.0.3 Path_Instance HBA Controller Path_Status ZG20302549 c4t0d4 c10t0d4 Controller Path_Instance Path_Status ZG20400420 c6t0d4 c8t0d4 0/ 4 Preferred? td1 td3 HBA no no Active no Available Preferred? td1 td3 no no Standby no Standby c12t0d4 6000-1FE1-0016-6C30-0009-2030-2549-000E 255/0.0.4 Path_Instance HBA Controller Path_Status ZG20302549 c4t0d3 c10t0d3 Controller Path_Instance Path_Status ZG20400420 c6t6d3 c8t6d3 #_Paths 4 4 Preferred? td1 td3 HBA no no Active no Available Preferred? td1 td3 no no Standby no Standby From the output file, look for the special device file name that corresponds to the WWN identifier of the Vdisk in the DR group. Use the special device file while creating the volume group, which is described in section, “Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F”. The EVA Command View for the WWN Identifier of the Vdisk is shown in Figure 4-3. Figure 4-3 EVA Command View for the WWN Identifier For more detailed information on setting up Command View EVA for configuring, managing, and monitoring your HP StorageWorks Enterprise Virtual Array Storage System, refer to the HP StorageWorks Command View EVA Getting Started Guide. Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 201 Identifying Special Device Files using Secure Path v3.0F As described in the previous section, for each Vdisk in a DR group, use CV EVA to retrieve its own unique World Wide Name (WWN) identifier. When Secure Path v3.0F is used for path failover capabilities all the paths to the vdisk are visible. To identify the special device file names for the matching WWN identifier. # autopath display Below is a sample output after running the autopath command: ================================================================ HPswsp Version : A.3.0F.00F.00F ================================================================= Array WWN : 5000-1FE1-5000-2EE0 ================================================================= Lun WWN : 6005-08B4-0010-0E01-0001-B000-0287-0000 Load Balancing Policy : No Load Balancing ================================================================= Device Path Status ================================================================= /dev/dsk/c3t0d1 Active /dev/dsk/c9t0d1 Active /dev/dsk/c15t0d1 Active /dev/dsk/c21t0d1 Active /dev/dsk/c4t0d1 Active /dev/dsk/c10t0d1 Active /dev/dsk/c16t0d1 Active /dev/dsk/c22t0d1 Active ================================================================= Lun WWN : 6005-08B4-0010-0E01-0001-B000-028E-0000 Load Balancing Policy : No Load Balancing ================================================================= Device Path Status ================================================================= /dev/dsk/c3t0d2 Active /dev/dsk/c9t0d2 Active /dev/dsk/c15t0d2 Active /dev/dsk/c21t0d2 Active /dev/dsk/c4t0d2 Active /dev/dsk/c10t0d2 Active /dev/dsk/c16t0d2 Active /dev/dsk/c22t0d2 Active From the output display identify the device file listing that corresponds with the WWN of the vdisk in the DR group. In the above sample listing there are eight device files that correspond to different paths of the same vdisk. Use any one of the device file names while creating a volume group, which is described in section, “Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F”. The CV EVA display can be used to identify the WWN for a vdisk. 202 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Identifying Special Device Files for PVLinks Configuration LVM PVlink feature can be used to handle path failovers to a storage device. The following describes how to identify device files for a Vdisk while setting up volume group using PVlinks. Use the RSM HV Mapper Tool or the HP StorageWorks EVAInfo tool to display the special device files that correspond to the WWN of the vdisk in the DR group. For RSM HV Mapper Tool # RSM_HV_Mapper.pl Collecting Host Volume info might take time. Please wait.Collecting Host Volume info from mc-node1.cup.hp.com.Collecting Host Volume info from mc-node2.cup.hp.com.Collecting Host Volume info from mc-node3.cup.hp.com.Collecting Host Volume info from mc-node4.cup.hp.com.Collecting Host Volume data done. See HostVolTable.txt for results. The HostVolTable.txt output file provides a mapping of the devices file to vdisks for all the hosts that are RSM enabled. In addition, the tool displays the WWID of the vdisk and the storage system to which the vdisk belongs. In the following sample listing there are eight device files that correspond to different paths to the same vdisk. Use all the device files identified while creating a volume group which is described in section, “Configuring Volume Groups using PVLinks”. ======================= mc-node1.cup.hp.com ======================= Virtual Disk Name..: \\XL-1\Vdisk001-DRGSynDCN Disk...............: /dev/dsk/c16t0d1 Disk...............: /dev/dsk/c17t0d1 Disk...............: /dev/dsk/c18t0d1 Disk...............: /dev/dsk/c20t0d1 Disk...............: /dev/dsk/c12t0d1 Disk...............: /dev/dsk/c13t0d1 Disk...............: /dev/dsk/c14t0d1 Disk...............: /dev/dsk/c15t0d1 World Wide Lun ID..: 6005-08b4-0010-203d-0000-6000-0017-0000 Virtual Disk Name..: \\XL-1\Vdisk002-DRGSynDCS Disk...............: /dev/dsk/c16t0d5 Disk...............: /dev/dsk/c17t0d5 Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 203 Disk...............: /dev/dsk/c18t0d5 Disk...............: /dev/dsk/c20t0d5 Disk...............: /dev/dsk/c12t0d5 Disk...............: /dev/dsk/c13t0d5 Disk...............: /dev/dsk/c14t0d5 Disk...............: /dev/dsk/c15t0d5 World Wide Lun ID..: 6005-08b4-0010-299b-0000-a000-002f-0000 For more information on configuring and installing RSM and RSM_HV_Mapper tool, contact your HP representative. For EVAInfo Tool # evainfo -w wwn This command displays device file information for vdisk with the specified vdisk WWN. Use HP StorageWorks Command View EVA to determine the WWN of the vdisks. The tool also displays information on the connected port, the controller on EVA and if the device path is active and optimized for I/O for that LUN. The following sample displays eight device files that correspond to different paths to the same vdisk. Use all the device files identified while creating a volume group which is described in section, "Configuring Volume Groups using PVLinks". Devicefile Array WWNNCapacity Controller/Port/Mode /dev/rdsk/c12t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-1/NonOptimized /dev/rdsk/c13t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-3/NonOptimized/dev/rdsk/c14t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-2/NonOptimized/dev/rdsk/c15t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-A/FP-4/NonOptimized/dev/rdsk/c16t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-1/Optimized/dev/rdsk/c17t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-3/Optimized/dev/rdsk/c18t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-2/Optimized/dev/rdsk/c19t1d6 5000-1FE1-5007-DBD0 6005-08B4-0010-78F1-0000-E000-0034-0000 25600MB Ctl-B/FP-4/Optimized Following is a sample output of the command on HP-UX 11i v3: # evainfo -P -w wwn Devicefil Array /dev/rdisk/disk10 204 WWNNCapacity Controller/Port/Mode 5000-1FE1-5007-DBA0 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA 6005-08B4-0010-786B-0000-A000-02DB-0000 Ctl-A/FP-3/NonOptimized 2048MB For more information on using the EVAInfo tool, see HP StorageWorks EVAInfo Release Notes. Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F Use the following procedure to create volume groups for source volumes and export them for access by other nodes. NOTE: Create volume groups only for source storage on a locally connected EVA unit. To create volume groups for source volumes on EVA unit located at the remote site, it is necessary to log onto a node located at that site before configuring the volume groups. The sample scriptmk1VGsin the /opt/cmcluster/toolkit/SGCAEVA/Samples directory can be modified to automate these steps. 1. Define the appropriate Volume Groups on each node that might run the application package. Use the following commands: # mkdir /dev/vgname # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the cluster. 2. Create the Volume Groups on source volume. Use the following commands: # pvcreate -f /dev/dsk/cxtydz" ---> "pvcreate -f /dev/rdsk/cxtydz 3. 4. Create the logical volume(s) for the volume group. De-activate the Volume Groups. # vgchange -a n /dev/vgname 5. Start the cluster and clusterize the Volume Groups. # cmruncl (if cluster is not already up and running) # vgchange -c y /dev/vgname 6. Test activating the Volume Groups with exclusive option. # vgchange -a e /dev/vgname 7. Create a back-up config file that will contain the cluster ID, having already an ID on disks/luns. # vgcfgbackup /dev/vgname Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 205 8. Use the vgexport command with the -p option to export the Volume Groups on the primary system without removing the HP-UX device files. # vgexport -s -p -m mapfile /dev/vgname Make sure that you copy the map files to all of the nodes. The sample script Samples/ftpit shows a semi-automated way (using ftp) to copy the files. It is necessary to only enter the password interactively. 9. De-activate the volume group. # vgchange -a n /dev/vgname Configuring Volume Groups using PVLinks Use the following steps to create volume groups for source volumes using PVLinks and export them for access by other nodes. NOTE: Create volume groups only for source storage on a locally connected EVA unit. To create volume groups for source volumes on EVA unit located at the remote site, it is necessary to log onto a node located at that site before configuring the volume groups. 1. Define the appropriate Volume Groups on each node that run the application package with the following commands: # mkdir /dev/vgname # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the cluster. 2. Create the Volume Groups on the source volume, which uses PVLink for path failover. All the special device files names associated for the vdisk as identified in the section “Identifying Special Device Files for PVLinks Configuration”. The following commands are an example of how VG using Pvlink is created for the vdisk identified by WWN 6005-08b4-0010-203d-0000-6000-0017-0000: # # # # # # # # # 3. pvcreate vgcreate vgextend vgextend vgextend vgextend vgextend vgextend vgextend -f /dev/dsk/c16t0d1 /dev/vgname /dev/dsk/c16t0d1 /dev/vgname /dev/dsk/c17t0d1 /dev/vgname /dev/dsk/c18t0d1 /dev/vgname /dev/dsk/c20t0d1 /dev/vgname /dev/dsk/c12t0d1 /dev/vgname /dev/dsk/c13t0d1 /dev/vgname /dev/dsk/c14t0d1 /dev/vgname /dev/dsk/c15t0d1 De-activate the Volume Groups. # vgchange -a n /dev/vgname 206 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA 4. Start the cluster and configure the Volume Groups. # cmruncl (if cluster is not already up and running) # vgchange -c y /dev/vgname 5. Test the Volume Groups activation with exclusive option. # vgchange -a e/dev/vgname 6. Create a back-up conffile that will contain the cluster ID, already having an ID on disks/luns. # vgcfgbackup /dev/vgname 7. Use the vgexport command with the-p option to export the Volume Groups on the primary system without removing the HP-UX device files. # vgexport -s -p -m mapfile /dev/vgname Make sure to copy the map files to all of the nodes. The sample script Samples/ ftpit shows a semi-automated way (using ftp) to copy the files. Only enter the password interactively. 8. De-activate the volume group. # vgchange -a n /dev/vgname Importing Volume Groups on Nodes at the Same Site Use the following procedure to import volume groups on cluster nodes located at the same site as the EVA on which you are doing the Logical Volume Manager configuration. The sample script mk2imports can be modified to automate these steps. NOTE: Before running vgimport, it is necessary to create the directory under the /dev directory and create the group file. 1. Define the Volume Groups on all nodes at the same site that will run the Serviceguard package. # mkdir /dev/vgname # mknod /dev/vgname/group c 64 0xnn0000 2. Import the Volume Groups on all nodes at the same site that will run the Serviceguard packages. # vgimport -vs -m mapfile /dev/vgname 3. Activate the Volume Groups and back up the configuration. # vgchange -a e /dev/vgname # vgcfgbackup /dev/vgname Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 207 See the sample script Samples/mk2imports. 4. De-activate the Volume Groups. # vgchange -a n /dev/vgname NOTE: Exclusive activation must be used for all volume groups associated with packages that use EVA. The design of Metrocluster Continuous Access EVA assumes that only one node in the cluster will have a Volume Group activated at a time. Importing Volume Groups on Nodes at the Remote Site Use the following procedure to import volume groups on all cluster nodes located at the site of the remote EVA. The sample script mk2imports can be modified to automate these steps. 1. Define the Volume Groups on all nodes at the same site that will run the Serviceguard package. # mkdir /dev/vgname # mknod /dev/vgname/group c 64 0xnn0000 2. Import the Volume Groups on all nodes at the same site that will run the Serviceguard packages. # vgimport -vs -m mapfile /dev/vgname 3. Verify the Volume Group configuration with the following procedures: • From the command view EVA, shown in Figure 4-4 failover the DR group to make it the source on the REMOTE site instead of the destination by following the steps described below: a. Select the destination site storage system from the command view EVA. b. Next select the desired Disaster Recovery group and click on “Fail Over”. 4. Activate the Volume Groups and back up the configuration. # vgchange -a e /dev/vgname # vgcfgbackup /dev/vgname See the sample script Samples/mk2imports. 5. De-activate the Volume Groups. # vgchange -a n /dev/vgname 6. 208 From the command view EVA, failback the SOURCE to its original site. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Figure 4-4 EVA Command View DR Group Properties Building a Metrocluster Solution with Continuous Access EVA Configuring Packages for Automatic Disaster Recovery After completing the following steps, packages will be able to automatically fail over to an alternate node in another data center and still have access to the data that they need in order to operate. This procedure must be repeated on all the cluster nodes for each Serviceguard package so the application can fail over to any of the nodes in the cluster. Customizations include editing an environment file to set environment variables, and customizing the package control script to include customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Consult the Managing Serviceguard user’s guide for more detailed instructions on how to start, halt, and move packages and their services between nodes in a cluster. For ease of troubleshooting, configure and test one package at a time. 1. Create a directory /etc/cmcluster/pkgname for each package: # mkdir /etc/cmcluster/pkgname 2. Create a package configuration file. # cd /etc/cmcluster/pkgname # cmmakepkg -p pkgname.config Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ pkgname.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Building a Metrocluster Solution with Continuous Access EVA 209 3. In the .config file, list the node names in the order in which you want the package to fail over. It is recommended for performance reasons, that you have the package fail over locally first, then to the remote data center. Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time required to obtain status from the EVA. NOTE: If using the EMS disk monitor as a package resource, do not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is no access from the host to the package disks. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the EVA. Clusters with multiple packages that use devices on the EVA will all cause package startup time to increase when more than one package is starting at the same time. 4. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set FS_UMOUNT_COUNT to 1. 5. 6. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. Refer to the Managing Serviceguard user’s guide for more detailed information on these functions. Copy the environment file template/opt/cmcluster/toolkit/SGCAEVA/ caeva.env to the package directory, naming it pkgname_caeva.env: # cp /opt/cmcluster/toolkit/SGCAEVA/caeva.env \ /etc/cmcluster/pkgdir/pkgname_caeva.env 210 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA NOTE: If not using a package name as a filename for the package control script, it is necessary to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (caeva) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_caeva.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_caeva.env. 7. Edit the environment file _caeva.env as follows: a. Set the CLUSTER_TYPE variable to METRO if this a Metrocluster. b. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. The operator may create the FORCEFLAG file in this directory. See Appendix B for an explanation of these variables. c. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. d. Set the WAIT_TIMEvariable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster. e. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created. f. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. g. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. h. Set the DC1_HOST_LIST variable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. Building a Metrocluster Solution with Continuous Access EVA 211 i. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. j. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. k. Set the DC2_HOST_LIST variable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. l. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds. 8. After customizing the control script file and creating the environment file, and before starting up the package, do a syntax check on the control script using the following command (be sure to include the -n option to perform syntax checking only): # sh -n If any messages are returned, it is recommended to correct the syntax errors. 9. Distribute Metrocluster Continuous Access EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Seviceguard package control script 212 pkgname_caeva.env Metrocluster Continuous Access EVA environment file pkgname.config Serviceguard package ASCII configuration file pkgname.sh Package monitor shell script, if applicable Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA other files Any other scripts used to manage Serviceguard packages 11. Check the configuration using the cmcheckconf -P pkgname.config, then apply the Serviceguard configuration using the cmapplyconf -P pkgname.config command or SAM. The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster Continuous Access EVA. Maintaining a Cluster that Uses Metrocluster Continuous Access EVA While the package is running, a manual storage failover on Continuous Access EVA outside of Metrocluster Continuous Access EVA software can cause the package to halt due to unexpected condition of the Continuous Access EVA volumes. It is recommended that no manual storage failover be performed while the package is running. A manual change of Continuous Access EVA link state from suspend to resume is allowed to re-establish data replication while the package is running. Continuous Access EVA Link Suspend and Resume Modes Upon Continuous Access links recovery, Continuous Access EVA automatically normalizes (the Continuous Access EVA term for “synchronizes”) the source Vdisk and destination Vdisk data. If the log disk is not full, when a Continuous Access connection is re-established, the contents of the log are written to the destination Vdisk to synchronize it with the source Vdisk. This process of writing the log contents, in the order that the writes occurred, is called merging. Since write ordering is maintained, the data on the destination Vdisk is consistent while merging is in progress. If the log disk is full, when a Continuous Access connection is re-established, a full copy from the source Vdisk to the destination Vdisk is done. Since a full copy is done at the block level, the data on the destination Vdisk is not consistent until the copy completes. If all Continuous Access links fail and if failsafe mode is disabled, the application package continues to run and writes new I/O to source Vdisk. The virtual log in EVA controller collects host write commands and data; DR group's log state changes from normal to logging. When a DR group is in a logging state, the log will grow in proportion to the amount of write I/O being sent to the source Vdisks. If the links are down for a long time, the log disk may be full, and full copy will happen automatically upon link recovery. If primary site fails while copy is in progress, the data in destination Vdisk is not consistent, and is not usable. To prevent this, after all Continuous Access links fail, it is recommended to manually put the Continuous Access link state to suspend mode by using the Command View EVA UI. When Continuous Access link is in suspend Building a Metrocluster Solution with Continuous Access EVA 213 state, Continuous Access EVA will not try to normalize the source and destination Vdisks upon links recovery until you manually change the link state to resume mode. Normal Maintenance There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Metrocluster Continuous Access EVA: 1. Stop the package with the appropriate Serviceguard command. # cmhaltpkg pkgname 2. Distribute the Metrocluster Continuous Access EVA configuration changes. # cmapplyconf -P pkgname.config 3. Start the package with the appropriate Serviceguard command. # cmmodpkg -e pkgname Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that the nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators” (page 31). Failback After resynchronization is complete, halt the package on the failover site, and restart it on the primary site. Metrocluster will then do a failover of the storage, which will trigger Continuous Access EVA to swap the personalities between the source and the destination Vdisks, returning source status to the primary site. Cluster Re-Configuration There might be situations when the cluster has to be re-configured due to maintenance purposes. The following procedure is recommended for re-configuration of the Metrocluster Continuous Access EVA: 1. 214 Before running the cmapplyconf -Ccommand, it is necessary to remove the cluster awareness from the Metrocluster volume groups. This is done by halting Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA all Metrocluster packages by running the appropriate Serviceguard command on the source side. # vgchange -a n 2. Halt the entire cluster and apply your changes with the Serviceguard command. # cmapplyconf -C 3. Re-start the cluster and mark the cluster ID on all Metrocluster volume groups. Run on the source side. # vgchange -c y Completing and Running a Continental Cluster Solution with Continuous Access EVA The following section describes how to configure a continental cluster solution using Continuous Access EVA, which requires the HP Metrocluster with Continuous Access EVA product. NOTE: Make sure to have completed the preparation for the Metrocluster Continuous Access EVA as described in section, “Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA ” on both primary and recovery sites. Setting up a Primary Package on the Primary Cluster Use the procedures in this section to configure a primary package on the primary cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. 1. Install Continentalclusters on all the cluster nodes in the primary cluster (skip this step if the software has been pre installed). Run swinstall(1m) to install HP Continentalclusters from an SD depot. 2. When swinstall(1m) has completed, create a directory for the new package in the primary cluster: # mkdir /etc/cmcluster/ Create a Serviceguard package configuration file in the primary cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize the Serviceguard package configuration file as appropriate to your application. Be sure to include the pathname of the control script /etc/ cmcluster// .cntl for the RUN_SCRIPT and HALT_SCRIPT parameters. Completing and Running a Continental Cluster Solution with Continuous Access EVA 215 Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Only after the primary packages start, use cmmodpkg to enable package switching on all primary packages. By enabling package switching in the package configuration, it will automatically start the primary package when the cluster starts. However, had there been a primary cluster disaster, resulting in the recovery package starting and running on the recovery cluster, the primary package should not be started until after first stopping the recovery package. 3. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTARTparameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater 4. 5. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Copy the environment file template: /opt/cmcluster/toolkit/SGCA/caeva.env to the package directory, naming it pkgname_caeva.env: # cp /opt/cmcluster/toolkit/SGCA/caeva.env \/etc/cmcluster/pkgname/pkgname_caeva.env NOTE: If a package name is not used as a filename for the package control script, it is required to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (caeva) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_caeva.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_caeva.env. 6. 216 Edit the environment file _caeva.env as follows: a. Set the CLUSTER_TYPE variable to CONTINENTAL b. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/ package_name, removing any quotes around the file names. The operator Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA c. d. e. f. g. may create the FORCEFLAG file in this directory. See Appendix B for a description of these variables. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. Set the WAIT_TIME variable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. h. Set the DC1_HOST_LIST variable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. i. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. j. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. k. Set the DC2_HOST _LISTvariable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. l. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds. Completing and Running a Continental Cluster Solution with Continuous Access EVA 217 7. Distribute Metrocluster Continuous Access EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp. # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname 8. 9. Apply the Serviceguard configuration using the cmapplyconf command or SAM. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname pkgname.cntl Serviceguard package control script pkgname_caeva.env Metrocluster Continuous Access EVA environment file pkgname.ascii Serviceguard package ASCII configuration file pkgname.sh Package monitor shell script, if applicable other files Any other scripts used to manage Serviceguard packages The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster Continuous Access EVA. 10. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and package failover. 11. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted at this time. Setting up a Recovery Package on the Recovery Cluster Use the procedures in this section to configure a recovery package on the recovery cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. Use the following steps for the recovery package set up: 1. Install Continentalclusters on all the cluster nodes in the recovery cluster (skip this step if the software has been pre installed). NOTE: Serviceguard should already be installed on all the cluster nodes. Run swinstall(1m) to install Continentalclusters from an SD depot. 2. When swinstall(1m) has completed, create a directory as follows for the new package in the recovery cluster. # mkdir /etc/cmcluster/ Create an Serviceguard package configuration file in the recovery cluster. # cd /etc/cmcluster/ 218 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA # cmmakepkg -p .ascii Customize it as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Do not use cmmodpkg to enable package switching on any recovery package. Enabling package switching will automatically start the recovery package. Package switching on a recovery package will be automatically set by the cmrecovercl command on the recovery cluster when it successfully starts the recovery package. 3. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. NOTE: Some of the control script variables, such as VG and LV, on the recovery cluster must be the same as on the primary cluster. Some of the control script variables, such as, FS, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART are probably the same as on the primary cluster. Some of the control script variables, such as IP and SUBNET, on the recovery cluster are probably different from those on the primary cluster. Make sure that you review all the variables accordingly. 4. 5. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See Managing Serviceguard for more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/ SGCA/xpca.env to the package directory, naming it pkgname_xpca.env: # cp /opt/cmcluster/toolkit/SGCA/caeva.env \ /etc/cmcluster/pkgname/pkgname_caeva.env 6. Edit the environment file _caeva.env as follows: a. Set the CLUSTER_TYPE variable to CONTINENTAL b. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/ package_name, removing any quotes around the file names. The operator may create the FORCEFLAG file in this directory. See Appendix B for an explanation of these variables. c. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. Completing and Running a Continental Cluster Solution with Continuous Access EVA 219 d. Set the WAIT_TIME variable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster. e. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created. f. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. g. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. h. Set the DC1_HOST_LISTvariable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. i. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. j. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. k. Set the DC2_HOST _LIST variable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names. l. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds. 7. Distribute Metrocluster Continuous Access EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. 220 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 8. 9. Apply the Serviceguard configuration using the cmapplyconf command or SAM. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: bkpbkgname.cntl Serviceguard package control script bkpkgname_caeva.env Metrocluster Continuous Access EVA environment file bkpkgname.ascii Serviceguard package ASCII configuration file bkpkgname.sh Package monitor shell script, if applicable other files Any other scripts you use to manage Serviceguard packages 10. Make sure the packages on the primary cluster are not running. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg) test the recovery cluster for cluster and package startup and package failover. 11. Any running package on the recovery cluster that has a counterpart on the primary cluster should be halted at this time. Setting up the Continental Cluster Configuration The steps below are the basic procedure for setting up the Continentalclusters configuration file and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2: “Designing a Continental Cluster”. 1. Generate the Continentalclusters configuration using the following command: # cmqueryconcl -C cmconcl.config 2. Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. When data replication is done using Continuous Access EVA, there are no data sender and receiver packages. Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step. 3. 4. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/ scripts to/etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the Completing and Running a Continental Cluster Solution with Continuous Access EVA 221 5. AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The monitor package should start automatically when the cluster is formed. Apply the monitor package to both cluster configurations. # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config 6. Apply the continental cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/ cmclconfig nor is there an equivalent file for Continentalclusters. Example: # cmapplyconcl -C cmconcl.config 7. Start the monitor package on both clusters. NOTE: The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status. 8. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. 9. Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster. 10. View the status of the continental cluster primary and recovery clusters, including configured event data. # cmviewconcl -v The continental cluster is now ready for testing. See “Testing the Continental Cluster” (page 91). Switching to the Recovery Cluster in Case of Disaster It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following: • • • 222 During an alert, the cmrecovercl will not start the recovery packages unless the -foption is used. During an alarm, the cmrecovercl will start the recovery packages without the -f option. When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Failover to Recovery Site After reception of the Continentalcluster’s alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The recovery package control script will evaluate the status of the DR group used by the package, and will do the failover of the DR group to the EVA in the recovery site. This means after the failover was successful, the DR group in the recovery site's EVA will be source and accessible with read/write mode. NOTE: If the Continuous Access links between the two EVAs are down, the recovery package will only start up if one of the following conditions are true: • The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package’s environment file is set to “Availability_Preferred”. • The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package's environment file is set to “ Data_Currency_Preferred”, and a FORCE_FLAG file exits in the package directory. After the recovery package is up and running, the EVA in the recovery site will have more current data than the one in the primary site. Failover Scenarios The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following scenarios addresses some of those failures and suggests recovery approaches applicable to environments using data replication provided by HP StorageWorks EVA series disk arrays and Continuous Access. Scenario 1 The primary site has lost power for a prolonged time, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the EVA disk array or the operating systems of the systems at the primary site. Failback to the Primary Site In this scenario, the EVA in the primary site is down due to the loss of power; therefore, the storage configuration information and the application data prior to power failure remain intact in the EVA. When the primary site’s power is restored, the EVA is up and running, and Continuous Access links are up, Continuous Access EVA software will automatically resynchronize the data from the recovery site's EVA back to the primary site’s EVA. If the resynchronization is a full copy operation, the data in the Completing and Running a Continental Cluster Solution with Continuous Access EVA 223 primary site's EVA is not consistent and is not usable until the full copy (resynchronization) completes. It is recommended to wait until the resynchronization is complete before failing back the packages to the primary site. The state of the DR group in the primary site’s EVA can be checked either via Command View (CV) EVA or SSSU command. If the state of each Vdisk in the DR group is shown “Normal”, then the resynchronization is complete, and the user can move the packages back to the primary site. Scenario 2 The primary site HP StorageWorks EVA disk array experienced a catastrophic hardware failure and all data was lost on the array. Failback to the Primary Site In this scenario the disk array is repaired or a new EVA array is commissioned at the primary site. Before the application can fail back to the primary site, the EVA in the recovery site (now is the source storage) needs to establish the replication relationship with the new EVA in the primary site (now is the destination storage). Refer to the procedure named “Return Operations to Replaced New Storage Hardware” in the “Continuous Access EVA Operation Guide” to rebuild the DR groups configured in the EVA. Once the DR groups re-build and the destination storage is synchronized with the source storage, the packages can be failed back to the primary site. Scenario 3 The primary site has lost power, which only impact the systems in the primary cluster. The primary cluster is down but the EVA disk array and Continuous Access links to the recovery site are up and running. Failback in Scenario 3 In this scenario the EVA disk arrays in both sites are up and running. The Continuous Access links are functional. When the recovery packages are up and running on the recovery site, Continuous Access EVA automatically switches the replication direction; the new data written on the recovery site's EVA is replicated to the primary site's EVA. After the primary cluster is back online, the packages can be failed back to the primary site. Reconfiguring Recovery Group Site Identities in Continentalclusters after a Recovery In a disaster scenario where the primary site goes out of operation, and there was no loss of data on the disk array or the servers. After the recovery is completed the recovered application can continue to run at the recovery site without requiring to fail back when the primary cluster becomes available at a later point in time. 224 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA This will avoid further downtime for the recovered application. But it will also be desired to have the same level of recovery capabilities for the applications in their new site, as they had in their original primary site. As described in the above scenario, Continentalclusters can be reconfigured to provide monitoring and recovery for the application now running on its recovery cluster. This is done by switching the identities of the sites in the applications context. (that is, the old (or original) primary site will become the recovery site and the old (or original) recovery site will become the primary site. This type of reconfiguration for Continentalclusters is possible only in a two cluster and two site configuration. Continentalclusters solutions using HP StorageWorks EVA disk arrays will need no disk array replication related tasks during the reconfigurations. Once the primary site EVA Disk array comes back online, the HP StorageWorks EVA Continuous Access will automatically resynchronize the data making the recovery site as “source” and the old primary site as “destination”. Use the cmswitchconclcommand (only in a two cluster configuration) to swap the site identities for all or a selected application’s recovery group. This is so that the applications can now be monitored and recovered from their once primary cluster. Completing and Running a Continental Cluster Solution with Continuous Access EVA 225 226 5 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF The EMC and Symmetrix Remote Data Facility (EMC SRDF) disk arrays allows configuration of physical data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the EMC SRDF software and the additional files that integrate the EMC with Serviceguard clusters. It then shows how to configure both metropolitan and continental cluster solutions using EMC SRDF. The topics discussed in this chapter are: • • • • • • • Files for Integrating Serviceguard with EMC SRDF Overview of EMC and SRDF Concepts Preparing the Cluster for Data Replication Building a Metrocluster Solution with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication Building a Continental Cluster Solution with EMC SRDF Metrocluster also defines a Site Aware Disaster Tolerant Architecture for application workloads such as Oracle Database 10gR2 RAC. This solution uses an additional software feature called the Site Controller package to provide disaster tolerance for workload databases. This solution is currently implemented for the Oracle Database 10gR2 RAC. For more information on the site aware disaster tolerant architecture, see “Overview of Site Aware Disaster Tolerant Architecture” (page 323). Files for Integrating Serviceguard with EMC SRDF Metrocluster is a set of executable programs, scripts and an environment file that work in an Serviceguard cluster to automate failover to alternate nodes in the case of disaster in a metropolitan cluster. The Metrocluster with EMC SRDF product contains the following files: Files for Integrating Serviceguard with EMC SRDF 227 Table 5-1 Metrocluster with EMC SRDF Template Files Name Description /opt/cmcluster/toolkit/SGSRDF/srdf.env The Metrocluster with EMC SRDF environmental file. This file must be customized for the specific EMC Symmetrix, and HP 9000 and, HP Integrity Servers host system configuration. Copies of this file must be customized for each separate Serviceguard package. /opt/cmcluster/toolkit/SGSRDF/samples A directory containing sample convenience shell scripts that must be edited before using. These shell scripts may help to automate some configuration tasks. These scripts are contributed, and not supported. /usr/sbin/DRCheckDiskStatus The script that checks for a specific environment file in the package directory and should not be edited. /usr/sbin/DRCheckSRDFDevGrp The program that manages the SRDF device group that is used by the package. Metrocluster with EMC SRDF has to be installed on all nodes that will run a Serviceguard package that accesses data on an EMC Symmetrix where the data are replicated to a second Symmetrix using the SRDF facility. In the event of node failure, the integration of Metrocluster with EMC SRDF with the package will allow the application to fail over in the following ways: • • Among local host systems that are attached to the same EMC Symmetrix. Between one system that is attached locally to its EMC Symmetrix and another “remote” host that is attached locally to the other EMC Symmetrix. Metrocluster with Symmetrix SRDF is specifically for configuring one or more Serviceguard packages whose data reside on EMC Symmetrix ICDAs (Integrated Cache Disk Arrays) and replicated with SRDF (Symmetrix Remote Data Facility). Metrocluster with Symmetrix SRDF can be used in metropolitan cluster configuration. The distance between the two data centers is limited by the distance of the Symmetrix arrays physical connection requirements, and the distance of Serviceguard heartbeat round-trip time (0 < 200 ms), whichever is less. Symmetrix configurations can be either 1 by 1 (one Symmetrix at each data center) or M by N (one or two Symmetrix frames at each data center). Configuration of Metrocluster with EMC SRDF must be done on all the cluster nodes, as is done for any other Serviceguard package. To use Metrocluster with EMC SRDF, Symmetrix host-based software for control and status of the EMC Symmetrix disk arrays must also be installed and configured on each HP 9000 and HP Integrity Servers host system that would execute the application package. 228 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Overview of EMC and SRDF Concepts EMC and Symmetrix Remote Data Facility (SRDF) is a Symmetrix-based business continuance and disaster recovery solution. SRDF is a configuration of Symmetrix systems, the purpose of which is to maintain multiple, real-time copies of logical volume data in more than one location. The Symmetrix systems can be in the same room, in different buildings within the same campus, or hundreds of kilometers apart. By maintaining real-time copies of data in different physical locations, SRDF enables the following operations with minimal impact on normal business operations: • • • • • Disaster Recovery Recovery from planned outages Remote backup Data center migration Data Replication and Mobility Figure 5-1 EMC R1 and R2 Definitions Symmetrix Array Symmetrix Array B1 R1 B2 R2 Optional BVCs R1a SRDF link may be bidirectional for different disk devices There may be multiple R1/R2 devices Data Center A Packages with primary nodes in this data center. See this Symmetrix as the R1 side and the Symmetrix in Data Center B as the R2 side. R2 B2 R1 B1 R2a Optional BVCs Data Center B Packages with primary nodes in this data center. See this Symmetrix as the R1 side and the Symmetrix in Data Center A as the R2 side. Preparing the Cluster for Data Replication When the following procedures are completed, an adoptive node will be able to access the data belonging to a package after it fails over. Use the convenience scripts in the /opt/cmcluster/toolkits/SGSRDF/Samples to automate some of the tasks in the following sections: • • • mk3symgrps.nodename —to create EMC Symmetrix device groups mk4gatekpr.nodename— to create gatekeeper devices mk2imports— to import volume groups Overview of EMC and SRDF Concepts 229 • • • ftpit— to copy the configuration to other nodes in the cluster pre.cmquery— to split SRDF links before applying the package configuration post.cmapply— to restore SRDF links after applying the package configuration These scripts should be copied from /opt/cmcluster/toolkits/SGSRDF to another directory, such as /etc/cmcluster/SRDF. Installing the Necessary Software Before any configuration can begin, make sure the following software is installed on all nodes: • • • Symmetrix EMC Solutions Enabler software allows the management of the Symmetrix disks from the node. Symmetrix PowerPath software should be installed if you are building an M by N configuration using PowerPath. However, if you are building an M by N configuration using RDF Enginuity Consistency Assist (RDF-ECA), you need to install only Symmetrix EMC Solutions Enabler. You do not have to install any other software. Metrocluster with Symmetrix SRDF should be installed according to the instructions in the Metrocluster with EMC SRDF Release Notes. NOTE: For Metrocluster/SRDF version A.05.01 and earlier, M by N configurations using PowerPath only are supported. As a result, the PowerPath software is a prerequisite for using an M by N configuration with Metrocluster. Building the Symmetrix CLI Database The Symmetrix CLI (Command Line Interface) should be installed on all nodes running packages that use data on the EMC Symmetrix disk arrays. Create the EMC Solutions Enabler database on each system using the following steps. (Refer to the Symmetrix EMC Solutions Enabler manual). Issue the following command on each node after the hardware is installed. # symcfg discover This builds the CLI database on the node. Display what is in the EMC Solutions Enabler database. • • • symdg list symld -g symdevgrpname list symgate list If the EMC Solutions Enabler database is not configured, the following error message will be displayed: The Symmetrix configuration could not be loaded for a locally attached Symmetrix 230 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF NOTE: Do not set the SYMCLI_SID and SYMCLI_DG environment variables before running thesymcfg command. These environment variables limit the amount of information gathered when the EMC Solutions Enabler database is created, and therefore will not be a complete database. Also, the SYMCLI_OFFLINE variable should not be set since this environment variable disables the command line interface. Determining Symmetrix Device Names on Each Node To correctly specify the device file names when creating Symmetrix device groups, be sure to map the HP-UX device files to the R1 and R2 Symmetrix devices. Use the following steps to gather the necessary information: 1. Obtain a list of data for the Symmetrix devices available, using the following command on each node without any options: # syminq Sample output from both the R1 and R2 sides is shown in Figure 5-2 and Figure 5-3. Figure 5-2 Sample syminq Output from a Node on the R1 Side Device Name /dev/rdsk/c0t0d0 /dev/rdsk/c0t0d1 /dev/rdsk/c0t0d2 /dev/rdsk/c0t0d3 /dev/rdsk/c0t1d0 /dev/rdsk/c0t1d1 /dev/rdsk/c0t1d0 /dev/rdsk/c0t1d1 /dev/rdsk/c0t1d2 /dev/rdsk/c1t2d0 /dev/rdsk/c1t2d1 /dev/rdsk/c1t2d2 Type Product Vendor ID Rev Ser Num Cap(KB) R1 R1 R2 R2 BCV BCV GK GK GK R1 R1 R2 EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 95004160 95005160 95006160 95007160 95024160 95025160 95040160 95041160 95042160 95004320 95005320 95006320 4418880 4418880 4418880 4418880 4418880 4418880 2880 2880 2880 4418880 4418880 4418880 Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Preparing the Cluster for Data Replication 231 Figure 5-3 Sample syminq Output from a Node on the R2 Side Device Name /dev/rdsk/c4t0d0 /dev/rdsk/c4t0d1 /dev/rdsk/c4t0d2 /dev/rdsk/c4t0d3 /dev/rdsk/c4t1d0 /dev/rdsk/c4t1d1 /dev/rdsk/c4t1d0 /dev/rdsk/c3t1d1 /dev/rdsk/c3t1d0 /dev/rdsk/c3t1d1 /dev/rdsk/c3t3d0 /dev/rdsk/c3t3d1 /dev/rdsk/c3t3d2 2. Type Product Vendor ID Rev Ser Num Cap(KB) R2 R2 R1 R1 BCV BCV GK GK BCV BCV R2 R2 R1 EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC EMC 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 5264 50014321 50015321 50016321 50017321 50034321 50035321 50040321 50041321 50030161 50031161 50004161 50005161 50006161 4418880 4418880 4418880 4418880 4418880 4418880 2880 2880 4418880 4418880 4418880 4418880 4418880 Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix Symmetrix The following information is needed from these listings for each Symmetrix logical device: • HP-UX device file name (for example, /dev/rdsk/c3t3d2). • Device type (R1, R2, BCV, GK, or blank) • Symmetrix serial number (for example, 50006161), useful in matching the HP-UX device names to the actual devices in the Symmetrix configuration downloaded by EMC support staff. This number is further explained in Figure 5-4. Figure 5-4 Parsing the Symmetrix Serial Number } } } 50 006 161 Symmetrix ID unique device number host adapter and port numbers — The Symmetrix ID is the same as the last two digits of the serial number of the Symmetrix frame, in this example50. — The next three hexadecimal digits are the unique Symmetrix device number that is seen in the output of the status command: # symrdf -g symdevgrpname query 232 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF This is used by the Metrocluster with Symmetrix SRDF control script and saved in the file /etc/cmcluster/package_name/symrdf.out. The contents of this file may be useful for debugging purposes. — The next three digits indicate the Symmetrix host adapter (SA or FA) and port numbers; this is useful to see multiple host links to the same Symmetrix device. For example, PV links will show up as two HP-UX device file names with the same device number, but with different host adapter and port numbers. 3. Use the symrdf command on each Symmetrix disk array (that is, from both the R1 and the R2 side) to pair the logical device names for the R1 and R2 sides of each SRDF link: # symrdf list Sample output is shown in Figure 5-5 and Figure 5-6. NOTE: The format of output varies depending on the symrdf version. Sample symrdf list Output from R1 Side Symmetrix ID: 000187400684 Local Device View ------------------------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --- ---- ------------0196 0197 0198 0199 019A 019B 019C 019C 0012 0013 0014 0015 0016 0017 0018 0019 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW S.. S.. S.. S.. S.. S.. S.. S.. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized 0 RW WD Synchronized Preparing the Cluster for Data Replication 233 Figure 5-5 Sample symrdf list Output from R1 Side Local Device View STATUS Sym RDF MODES RDF S T A T E S -------- ----- --------- R1 Ivn Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks 000 001 004 005 006 007 008 009 000 001 004 005 006 007 008 009 OFF OFF OFF OFF OFF OFF OFF OFF R1:1 R1:1 R1:1 R1:1 R2:2 R2:2 R1:1 R1:1 ?? ?? RW RW RW RW RW RW RW RW RW RW WD WD RW RW RW RW RW RW RW RW RW RW SYN SYN SYN SYN SYN SYN SYN SYN DIS DIS DIS DIS DIS DIS DIS DIS 0 0 0 0 0 0 0 0 R2 Ivn --------------Tracks Dev RDev Pair 0 0 0 0 0 0 0 0 RW RW RW RW WD WD RW RW NR NR WD WD RW RW WD WD Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Sample symrdf list Output from R2 Side Local Device View ------------------------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --- ---- ------------0012 0013 0014 0015 0016 0017 0018 0019 234 0196 0197 0198 0199 019A 019B 019C 019D R2:13 R2:13 R2:13 R2:13 R2:13 R2:13 R2:13 R2:13 WD WD WD WD WD WD WD WD WD WD WD WD WD WD WD WD RW RW RW RW RW RW RW RW S.. S.. S.. S.. S.. S.. S.. S.. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WD RW WD RW WD RW WD RW WD RW WD RW WD RW 0 WD RW Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Figure 5-6 Sample symrdf list Output from R2 Side Local Device View STATUS Sym 4. RDF MODES RDF S T A T E S -------- ----- --------- R1 Ivn Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks 000 001 004 005 006 007 008 000 001 004 005 006 007 008 OFF OFF OFF OFF OFF OFF OFF R2:1 R2:1 R2:1 R2:1 R1:2 R1:2 R2:1 NR NR RW RW RW RW RW WD WD WD WD RW RW WD RW RW RW RW RW RW RW SYN SYN SYN SYN SYN SYN SYN DIS DIS DIS DIS DIS DIS DIS R2 Ivn --------------Tracks Dev RDev Pair 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NR NR WD WD RW RW WD RW RW RW RW WD WD RW Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Match the logical device numbers in the symrdf listings with the HP-UX device file names in the output from the syminq command. This displays which devices are seen from each node to ensure this node can see all necessary devices. Use the Symmetrix ID to determine which Symmetrix array is connected to the node. Then use the Symmetrix device number to determine which devices are in the same logical device seen by each node that is connected to the same Symmetrix unit. Record the HP-UX device file names in your table. Table 5-2 shows a partial mapping for a 4 node cluster connected to two Symmetrix arrays (95 and 50). There may be many R1 and R2 devices and many gatekeepers for each package, so this table will be much larger for most clusters. Also, with M by N configurations, the number of devices increases according to the number of Symmetrix frames. Table 5-2 Mapping for a 4 Node Cluster connected to 2 Symmetrix Arrays Symmetrix ID, device #, and type Node 1 /dev/rdsk Node 2 Node 3 /dev/rdsk Nodes 4 /dev/rdsk device file name /dev/rdsk device device file name device file name file name ID 95 c0t4d0 Dev# 005 Type R1 ID 50 Dev# 014 Type R2 ID 95 Dev# 00A Type R2 c6t0d0 c4t0d0 c0t4d0 c0t2d2 c0t4d2 Preparing the Cluster for Data Replication 235 Table 5-2 Mapping for a 4 Node Cluster connected to 2 Symmetrix Arrays (continued) Symmetrix ID, device #, and type ID 50 Dev# 012 Type R1 ID 95 Dev# 040 Type GK ID 50 Dev# 041 Type GK ID 95 Dev# 028 Type BCV Node 1 /dev/rdsk Node 2 Node 3 /dev/rdsk Nodes 4 /dev/rdsk device file name /dev/rdsk device device file name device file name file name c3t0d2 c4t3d2 c0t15d0 c0t15d0 c3t15d1 c5t15d1 c4t3d2 c4t3d2 n/a n/a NOTE: The Symmetrix device number may be the same or different in each of the Symmetrix units for the same logical device. In other words, the device number for the logical device on the R1 side of the SRDF link may be different from the device number for the logical device on the R2 side of the SRDF link. The Symmetrix logical device numbers in these examples were configured to be the same number so the cluster is easier to manage. If reconfiguring an existing cluster, the Dev and RDev devices will probably not be the same number. When determining the configuration for the Symmetrix devices for a new installation, it is recommended to use the same Symmetrix device number for both the R1 and R2 devices. It is also recommended the same target and LUN number be configured for all nodes that have access to the same Symmetrix logical device. Building a Metrocluster Solution with EMC SRDF Setting up 1 by 1 Configurations The most common Symmetrix configuration used with Metrocluster with EMC SRDF is a 1 by 1 configuration in which there is a single Symmetrix frame at each Data Center. This section describes how to set up this configuration using EMC Solutions Enabler and HP-UX commands. It is assumed the Symmetrix CLI database is already set up 236 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF on each node, as described in the previous section “Preparing the Cluster for Data Replication.” A basic 1 by 1 configuration is shown in Figure 5-7, which is a graphical view of the data in Table 5-2. Figure 5-7 Mapping HP-UX Device File Names to Symmetrix Units Node 1 Node 2 /dev /de /rdsk/c0t 4d0 /de v/rdsk /de v/rds /c0t2d 2 v/r k/c ds 0t1 k/c 5d 0 4t3 d2 d0 t0 c6 d2 / sk 0t4 /rd /c 0 ev dsk 15d /d ev/r k/c0t s /d v/rd /de dsk/c4t3d2 /dev/r Symmetrix ID 95 R1 R2 GK BCV 0 c4t0d rdsk/ 2 t0d 1 3 c sk/ t15d d r / 3 v /c /de dsk v/r e /d Symmetrix ID 50 SRDF Data Center A R2 R1 GK /dev/ Node 3 /d ev / v/r rdsk ds /c /de k/c 0t v/rd 4 sk/c 4t3d d0 5t1 2 5d1 Node 4 /de Data Center B Creating Symmetrix Device Groups A single Symmetrix device group must be defined for each package on each node that is connected to the Symmetrix. The following procedure must be done on each node that may potentially run the package: NOTE: The sample scripts mk3symgrps.nodename can be modified to automate these steps. 1. Use the symdg command, or modify the mk3symgrps.nodename script to define an R1 and an R2 device group for each package. # symdg create -type RDF1 devgroupname Issue the above command on nodes attached to the R1 side. # symdg create -type RDF2 devgroupname Issue the above command on nodes attached to the R2 side. The group name must be the same on each node on the R1 and R2 side. The devgroup name used will be later placed in variable DEVICE_GROUP defined in pkg.env file. 2. Use the symld command to add all LUNs that comprise the Volume Group for that package on that host. The HP-UX device file names for all Volume Groups that belong to the package must be defined in one Symmetrix device group. All Building a Metrocluster Solution with EMC SRDF 237 devices belonging to Volume Groups that are owned by an application package must be added to a single Symmetrix device group. # symld -g devgroupnameadd dev devnumber1 # symld -g devgroupnameadd dev devnumber2 At this point, it will be helpful to refer to Table 5-2 (page 235). Although, the HP-UX device file names on each node specified may be different, the device group must be the same on each node. When creating the Symmetrix device groups, specify only one HP-UX path to a particular Symmetrix device. Do not specify alternate paths (PVLinks). The EMC Solutions Enabler uses the HP-UX path only to determine to which Symmetrix device you are referring. The Symmetrix device may be added to the device group only once. NOTE: Symmetrix Logical Device names must be the default names of the form DEVnnn (for example, DEV001). Do not use this option for creating your own device names. The script must be customized for each system including: • • • Particular HP-UX device file names. Symmetrix device group name (an arbitrary, but unique name may be chosen for each group that defines all of the volume groups (VGs), which belong to a particular Serviceguard package). Keyword RDF1 or RDF2. Configuring Gatekeeper Devices Gatekeeper devices must be unique per Serviceguard package to prevent contention in the Symmetrix when commands are issued, such as two or more packages starting up at the same time. Gatekeeper devices are unique to a Symmetrix unit. They are not replicated across the SRDF link. Gatekeeper devices are marked GK in the syminq output, and are usually 2880 KB in size. NOTE: The sample scripts mk4gatekpr.nodename can be modified to automate these steps. 1. Define at least two gatekeepers per package per node (assuming PV links are used). They will only be available for use by that node. Each gatekeeper device is configured on different physical links. # symgate -sid sidnumber1 define dev devnumber1 238 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF # symgate -sid sidnumber2 define dev devnumber2 2. Associate the gatekeeper devices with the Symmetrix device group for that package. # symgate -sid sidnumber1 -g devgroupname \associate dev devnumber1 # symgate -sid sidnumber2 -g devgroupname \associate dev devnumber2 3. Define a pool of four or more additional gatekeeper devices that are not associated with any particular node. The EMC Solutions Enabler will switch to an alternate gatekeeper device if the path to the primary gatekeeper device fails. Verifying the EMC Symmetrix Configuration When finished with all these steps, use the symrdf list command to get a listing of all devices and their states. Back up the EMC Solutions Enabler database on each node, so that these configuration steps do not have to be repeated if a failure corrupts the database. The EMC Solutions Enabler database is a binary file located in the directory /var/symapi/db. Creating and Exporting Volume Groups Use the following procedure to create volume groups and export them for access by other nodes. The sample script mk1VGsin the /opt/cmcluster/toolkit/SGSRDF/ Samples directory can be modified to automate these steps. 1. Define the appropriate Volume Groups (VGs) on each node that run the application package. # mkdir /dev/vgxx # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the cluster. 2. Create volume groups only on the primary system. Use the vgcreate and vgextend commands, specifying the appropriate HP-UX device file names. # vgcreate vgname /dev/dsk/cxtydz # vgextend vgname /dev/dsk/cxtydz 3. Use the vgchangecommand to de-activate the volume group and use the vgexport command with the -p option to export the VGs on the primary system without removing the HP-UX device files: # vgchange -a n vgname # vgexport -v -s -p -m mapfilename vgname Building a Metrocluster Solution with EMC SRDF 239 Copy the map files to all of the nodes. The sample script Samples/ftpit shows a semi-automated way (using ftp) to copy the files. Enter the password interactively. Importing Volume Groups on Other Nodes Use the following procedure to import volume groups. The sample script mk2imports can be modified to automate these steps: 1. Import the VGs on all of the other systems that might run the Serviceguard package and backup the LVM configuration. Make sure that you split the logical SRDF links before importing the VGs, especially if you are importing the VGs on the R2 side. # symrdf -g devgrpname split -v # vgimport -v -s -m mapfilename vgname 2. Back up the configuration. # vgchange -a y vgname # vgcfgbackup vgname # vgchange -a n vgname # symrdf -g devgrpname establish -v See the sample script Samples/mk2imports. NOTE: Exclusive activation must be used for all volume groups associated with packages that use the EMC. The design of Metrocluster with EMC SRDF assumes that only one system in the cluster will have a VG activated at a time. Configuring PV Links The examples in the previous sections describe the use of thevgimport and vgexport commands with the -s option. In addition, the mk1VGs script uses a -s in the vgexport command, and the mk2imports script uses a -s in the vgimport command. Optionally, remove this option from both commands if using PV links. The -s option to the vgexport command saves the volume group id (VGID) in the map file, but it does not preserve the order of PV links. To specify the exact order of PV links, do not use the -s option with vgexport, and in the vgimport command, enter the individual links in the desired order, as in the following example: # vgimport -v -m mapfilename vgname linkname1 linkname2 Grouping the Symmetrix Devices at Each Data Center The use of R1/R2 devices in M by N configurations of multiple Symmetrix frames is enabled by means of consistency groups. A consistency group is a set of Symmetrix RDF devices that are configured to act in unison to maintain the integrity of a database. 240 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Because Metrocluster with EMC SRDF works at the device group level, the consistency group is implemented and managed as a single device group even though it spans multiple Symmetrix frames. Consistency groups are created using either EMC PowerPath or RDF Enginuity Consistency Assist (RDF-ECA) feature of Solutions Enabler. In a consistency group, Symmetrix tracks the I/Os that are written to the devices. If an I/O cannot be written to a remote Symmetrix because a remote device or an RDF link has failed, the data flow to the other Symmetrix will be halted in less than one second. Once mirroring is resumed, any updates to the data is propagated with normal SRDF operation. Figure 5-8 shows when there is a break in the links between two of the Symmetrix frames, the use of consistency groups (depicted as dashed oval lines) ensures that the other two links are also suspended. Figure 5-8 2 X 2 Node and Data Center Configuration with Consistency Groups When these links both go down... Data Center A node 1 x Data Center B node 3 x pkg A pkg C These links are suspended by EMC PowerPath... node 4 pkg B pkg D node 2 Third Location (Arbitrators) node 5 node 6 Building a Metrocluster Solution with EMC SRDF 241 Setting up M by N Configurations Metropolitan clusters using EMC SRDF can be built in configurations that use more than two EMC Symmetrix disk arrays. In such configurations, M arrays located in Data Center A may be connected to N arrays located in Data Center B. This section describes how to set up an M by N configuration using EMC Solutions Enabler and HP-UX commands. It is assumed that either Symmetrix PowerPath software is installed or RDF-ECA feature of Solutions Enabler is enabled on all nodes and the Symmetrix CLI database on each node has already been setup, as described in the section, “Preparing the Cluster for Data Replication” (page 229). CAUTION: M by N configurations cannot be used with R1/R2 swapping. Figure 5-9 depicts a 2 by 2 configuration. Data in this figure are used in the example commands given in the following sections. This example shows R1 devices at one data center and R2 devices with Business Continuity Volumes (BCVs) at the other. However, a bidirectional configuration is also possible, with R1 devices on both sites. Figure 5-9 Devices and Symmetrix Units in M by N Configurations SYMMETRIX A Node1 Gatekeeper /dev/rdsk/c5t0d0 (010) R1 Devices /dev/rdsk/c6t0d0 (00C) /dev/rdsk/c6t0d1 (00D) Sim ID 638 Node2 SYMMETRIX C Gatekeeper /dev/rdsk/c7t0d0 (002) R2 Devices /dev/rdsk/c8t0d0 (018) /dev/rdsk/c8t0d1 (019) Channel Sim ID 021 SYMMETRIX B Gatekeeper /dev/rdsk/c5t0d1 (009) R1 Devices /dev/rdsk/c5t0d2 (010) /dev/rdsk/c5t0d3 (011) Sim ID 130 Node3 BCV Devices /dev/rdsk/c8t0d2 (01A) SRDF/ /dev/rdsk/c8t0d3 (01B) Fibre SYMMETRIX D Node4 Gatekeeper /dev/rdsk/c6t0d0 (00B) R2 Devices /dev/rdsk/c9t0d0 (050) /dev/rdsk/c9t0d1 (051) BCV Devices /dev/rdsk/c9t0d2 (052) /dev/rdsk/c9t031 (053) Sim ID 363 Creating Symmetrix Device Groups For each node on the R1 side (node1 and node2), create the device groups as follows. Note: It is necessary to create two device groups since device groups do not span frames. The following examples are based on the configuration shown in Figure 5-9. 1. Create device groups using the following commands on each node on the R1 side. # symdg -type RDF1 create dgoraA 242 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF # symdg -type RDF1 create dgoraB 2. For each node on the R2 side (node3 and node4), create the device groups as follows. Note: It is necessary to create two device groups since device groups do not span frames. Do the following on each node on the R2 side. # symdg -type RDF2 create dgoraA # symdg -type RDF2 create dgoraB 3. For each node on the R1 side (node1 and node2), assign the R1 devices to the device groups. # symld -sid 638 -g dgoraA add dev 00C # symld -sid 638 -g dgoraA add dev 00D # symld -sid 130 -g dgoraB add dev 010 # symld -sid 130 -g dgoraB add dev 011 4. For each node on the R2 side (node3 and node4), assign the R2 devices to the device groups. # symld -sid 021 -g dgoraA add dev 018 # symld -sid 021 -g dgoraA add dev 019 # symld -sid 363 -g dgoraB add dev 050 # symld -sid 363 -g dgoraB add dev 051 5. On each node on the R2 side (node3 and node4), associate the local BCV devices to the R2 device group. # symbcv -g dgoraA add dev 01A # symbcv -g dgoraA add dev 01B # symbcv -d dgoraB add dev 052 # symbcv -d dgoraB add dev 053 6. To manage the BCV devices from the R1 side, it is necessary to associate the BCV devices with the device groups that are configured on the R1 side. Use the following commands on hosts directly connected to the R1 Symmetrix. # symbcv -g dgoraA associate dev 01A -rdf # symbcv -g dgoraA associate dev 01B -rdf # symbcv -g dgoraB associate dev 052 -rdf # symbcv -g dgoraB associate dev 053 -rdf 7. Establish the BCV devices using the following commands from the R2 side. # symmir -g dgoraA -full est Building a Metrocluster Solution with EMC SRDF 243 # symmir -g dgoraB -full est 8. Alternatively, establish the BCV devices with the following commands from the R1 side. # symmir -g dgoraA -full est -rdf # symmir -g dgoraB -full est -rdf Configuring Gatekeeper Devices It is necessary to have a gatekeeper device for each device group in the consistency group that will be built in a later step. Use the following commands on all nodes on the R1 side to define gatekeepers and associate them with device groups. # symgate -sid 638 define dev 010 # symgate -sid 130 define dev 009 # symgate -sid 638 -g dgoraA associate dev 010 # symgate -sid 130 -g dgoraB associate dev 009 Use the following commands on all nodes on the R2 side to define gatekeepers and associate them with device groups. # symgate -sid 021 define dev 002 # symgate -sid 363 define dev 00B # symgate -sid 021 -g dgoraA associate dev 002 # symgate -sid 363 -g dgoraB associate dev 00B Creating the Consistency Groups To configure consistency groups for using Metrocluster with EMC SRDF, first create device groups and gatekeeper groups as described in previous sections. The following examples are based on the configuration shown in Figure 5-9. Use the following steps for each package: 1. On each node in the cluster, create an empty consistency group using the symcg command. To create a consistency group using PowerPath on the R1 side, use # symcg create cgoradb -ppath -type rdf1 Replace rdf1 with rdf2 in the command to create the consistency group on the R2 side. To create a consistency group using RDF-ECA on the R1 side, use # symcg create cgoradb -rdf_consistency -type rdf1 Replace rdf1 with rdf2 in the command to create the consistency group on the R2 side. 244 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Use the same name on all nodes. To use RDF-ECA, ensure that the RDF process daemon is running on any of the locally attached hosts. For redundancy, it is recommended that you run multiple instances of the RDF daemon on different hosts. For more information on configuring and using RDF-ECA, refer to EMC documentation web site. 2. Add each device that is going to be used in the consistency group. Use the appropriate SID numbers and device names for the data center that the node is a part of. For example, on node1 and node2 in Data Center A. # symcg -cg cgoradb -sid 638 add dev 00C # symcg -cg cgoradb -sid 638 add dev 00D # symcg -cg cgoradb -sid 130 add dev 010 # symcg -cg cgoradb -sid 130 add dev 011 And on node3 and node4 in Data Center B. # symcg -cg cgoradb -sid 021 add dev 018 # symcg -cg cgoradb -sid 021 add dev 019 # symcg -cg cgoradb -sid 363 add dev 050 # symcg -cg cgoradb -sid 363 add dev 051 3. Enable the consistency group. # symcg -g cgoradb enable NOTE: 4. This important step must be carried out on every node. Establish the BCV devices in the secondary Symmetrix as a mirror of the standard device. From either node3 or node4. # symmir -cg cgoradb -full est # symmir -cg cgoradb -full est Alternatively, from either node1 or node2. # symmir -cg cgoradb -full est -rdf Creating Volume Groups The following procedures assume the volume groups being created for a cluster and the device groups, as shown in Figure 5-9. Use the following steps on node1: 1. Create the physical volumes. # pvcreate -f /dev/rdsk/c6t0d0 # pvcreate -f /dev/rdsk/c6t0d1 # pvcreate -f /dev/rdsk/c5t0d2 Building a Metrocluster Solution with EMC SRDF 245 # pvcreate -f /dev/rdsk/c5t0d3 2. Create the directories and special files for the volume groups. # mkdir /dev/vgoraA # mkdir /dev/vgoraB # mknod /dev/vgoraA/group c 64 0x01000 # mknod /dev/vgoraB/group c 64 0x02000 3. Create the volume groups. Be careful not to span Symmetrix frames. # vgcreate /dev/vgoraA /dev/rdsk/c6t0d0 # vgextend /dev/vgoraA /dev/rdsk/c6t0d1 # vgcreate /dev/vgoraB /dev/rdsk/c5t0d2 # vgextend /dev/vgoraB /dev/rdsk/c5t0d3 4. Create the logical volumes. (XXXX indicates size in MB) # lvcreate -L XXXX /dev/vgoraA # lvcreate -L XXXX /dev/vgoraB 5. Install a VxFS file system on the logical volumes. # newfs -F vxfs /dev/vgoraA/rlvol1 # newfs -F vxfs /dev/vgoraB/rlvol1 6. Create map files to permit exporting the volume groups to other systems. # vgchange -a n vgoraA # vgchange -a n vgoraB # vgexport -v -s -p -m /tmp/vgoraA.map vgoraA # vgexport -v -s -p -m /tmp/vgoraB.map vgoraB 7. Copy the map files to the other nodes in the cluster. # rcp /tmp/vgoraA.map node2:/tmp/vgoraA.map # rcp /tmp/vgoraB.map node2:/tmp/vgoraB.map 8. Split the SRDF logical links. # symrdf -g dgoraA split -v # symrdf -g dgoraB split -v On node2, node3, and node4, perform the following steps. 1. Create the volume group directories and special files. # mkdir /dev/vgoraA 246 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF # mkdir /dev/vgoraB 2. Import the volume groups to each system: # vgimport -v -s -m /tmp/vgoraA.map vgoraA # vgimport -v -s -m /tmp/vgoraB.map vgoraB 3. After importing volume groups to all the other nodes, establish SRDF links. # symrdf -gdgoraA establish -v # symrdf -gdgoraB establish -v NOTE: While creating a volume group, you can choose either the legacy or agile Device Special File (DSF) naming convention. To determine the mapping between these DSFs, use the # ioscan –m dsf command Creating VxVM Disk Groups using Metrocluster with EMC SRDF If using VERITAS storage, use the following procedure to create disk groups. It is assumed VERITAS root disk (rootdg) has been created on the system where configuring the storage. The following section shows how to set up VERITAS disk groups. On one node do the following: 1. Check to make sure the devices are in a synchronized state. # symrdf -g dgoraA query # symrdf -g dgoraB query 2. Initialize disks to be used with VxVM by running the vxdisksetup command. # /etc/vx/bin/vxdisksetup -i c5t0d0 3. Create the disk group to be used by using the vxdg command on the primary system. # vxdg init logdata c5t0d2 c5t0d3 c5t0d0 c5t0d1 4. Verify the configuration. # vxdg list 5. Create the logical volume. # vxassist -g logdata make logfile 2048m 6. Verify the configuration. # vxprint -g logdata 7. Make the filesystem. # newfs -F vxfs /dev/vx/rdsk/logdata/logfile Building a Metrocluster Solution with EMC SRDF 247 8. Create a directory to mount the volume group. # mkdir /logs 9. Mount the volume group: # mount /dev/vx/dsk/logdata/logfile /logs 10. Check if file system exits, then unmount the file system. # umount /logs Validating VxVM Disk Groups using Metrocluster with EMC SRDF The following section shows how to validate VERITAS diskgroups. On one node do the following: 1. Deport the disk group. # vxdg deport logdata 2. Enable other cluster nodes to have access to the disk group. # vxdctl enable 3. Split the SRDF link to enable R2 Read/Write permission. # symrdf -g dgoraA split # symrdf -g dgoraB split 4. Import the disk group. # vxdg -tfC import logdata 5. Start the logical volume in the disk group. # vxvol -g logdata startall 6. Create a directory to mount the volume. # mkdir /logs 7. Mount the volume. # mount /dev/vx/dsk/logdata/logfile /logs 8. Check to make sure the file system is present, then unmount the file system. # umount /logs 9. Establish the SRDF link. # symrdf -g devgrpA establish 248 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF IMPORTANT: VxVM 4.1 does not support the agile DSF naming convention with HP-UX 11i v3. NOTE: In a Metrocluster/SRDF environment, VxVM commands should not be run against write-disabled disks. This is due to VxVM potentially putting these disks into an offline state. Subsequent activation of a VxVM disk group might fail when the disks are again write-enabled, and requires a vxdisk scandisks to be executed prior to disk group activation. Additional Examples of M by N Configurations Figure 5-10 shows a 2 by 1 configuration with BCV’s, which indicates R1 volumes at Data Center A and R2 volumes and BCVs at Data Center B for pkg A and pkg B. Figure 5-10 2 by 1 Configuration Data Center A Data Center B R2 vols R1 vols node3 node1 pkg A node4 SRDF Links BCVs Third Location (Arbitrators) node5 pkg B node6 node2 R1 vols Figure 5-11 shows a bidirectional 2 by 2 configuration with additional packages on node3 and node4, and R1 and R2 volumes at both data centers. In this configuration, R1 volumes and pkg A and pkg B are at Data Center A, and R2 volumes are at Data Center B. R1 volumes for pkg C and pkg D are at Data Center B, and R2 volumes are at Data Center A. Building a Metrocluster Solution with EMC SRDF 249 Figure 5-11 Bidirectional 2 by 2 Configuration Data Center A Data Center B R1 for C & D R1 for A & B node1 node3 pkg C pkg A BCVs node4 pkg B node2 R2 for C & D pkg D R2 for A & B Third Location (Arbitrators) node5 node6 Configuring Serviceguard Packages for Automatic Disaster Recovery Before implementing these procedures it is necessary to do the following: • • • • Configure your cluster hardware according to disaster tolerant architecture guidelines. See the Understanding and Designing Serviceguard Disaster Tolerant Architectures user’s guide. Configure the Serviceguard cluster according to the procedures outlined in Managing Serviceguard user’s guide. Create the EMC Solutions Enabler database, and build Symmetrix device groups, consistency groups, and gatekeepers for each package. Export exclusive volume groups for each package as described in “Preparing the Cluster for Data Replication” (page 229). This must be done on each node that will potentially run the package. Install the Metrocluster EMC SRDF product on all nodes according to the instructions in the Metrocluster with EMC SRDF Release Notes. When these steps have been completed, packages will be able to automatically fail over to an alternate node in another data center and still have access to the data it needs to function. 250 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF This procedure must be repeated on all the cluster nodes for each Serviceguard application package so the application can fail over to any of the nodes in the cluster. Customizations include setting environment variables and supplying customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Consult the Managing Serviceguard user’s guide for more detailed instructions on how to start, halt, and move packages and their services between nodes in a cluster. For ease of troubleshooting, it is recommended to configure and test one package at a time. 1. Create a directory /etc/cmcluster/pkgname for each package. # mkdir /etc/cmcluster/pkgname 2. Create a package configuration file. # cd /etc/cmcluster/pkgname # cmmakepkg -p pkgname.ascii Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ pkgname.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. 3. In the .ascii file, list the node names in the order for which the package is to fail over. It is recommended for performance reasons, that the package fail over locally first, then to the remote data center. NOTE: If using the EMS disk monitor as a package resource, do not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is not access from the host to the package disks. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the EMC Symmetrix disk array. Clusters with multiple packages that use devices on the EMC Symmetrix disk array will cause package startup time to increase when more than one package is starting at the same time. The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time due to getting status from the Symmetrix. 4. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, Building a Metrocluster Solution with EMC SRDF 251 SERVICE_CMD, and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. 5. 6. 7. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. In the package_name.ascii file, list the node names in the order in which you want the package to fail over. It is recommended, for performance reasons, that the package fail over locally first, then to the remote data center. For the MAX_CONFIGURED_PACKAGES parameter, the minimum value is 0 and maximum default value is 150 (depending on the number of packages that will run on the cluster). Copy the environment file template /opt/cmcluster/toolkit/ SGSRDF/srdf.env to the package directory, naming it pkgname_srdf.env: # cp /opt/cmcluster/toolkit/SGSRDF/srdf.env \ /etc/cmcluster/pkgname/pkgname_srdf.env NOTE: If not use a package name as a filename for the package control script, it is necessary to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (srdf) used. The extension .env of the file must be used. The following examples demonstrate how the environment file name should be chosen: Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_srdf.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_srdf.env. 8. 252 Edit the environment file as follows: a. Add the path where the EMC Solutions Enabler software binaries have been installed to the PATH environment variable. If the software is installed in the default location,/usr/symcli/bin, there is no need to set the PATH environment variable in this file. b. Uncomment AUTO*environment variables. It is recommended to retain the default values of these variables unless there is a specific business requirement to change them. See Appendix B for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to/etc/cmcluster/package_name, removing any quotes around the file names. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF d. Uncomment the RDF_MODE variables and set it to the RDF mode for RDF pairs in the device group to be Synchronous (sync) or Asynchronous (async). e. Uncomment the DEVICE_GROUP variables EMC Symmetrix for the local disk array and set it to the Symmetrix device group names given in the symdg list command. If you are using an M by N configuration, configure the DEVICE_GROUP variable with the name of the consistency group. f. Uncomment the RETRY and RETRYTIME variables. These variables are used to decide how often and how many times to retry the Symmetrix status commands. The defaults should be used for the first package. For other packages RETRYTIME should be altered to avoid contention when more than one package is starting on a node. RETRY * RETRYTIME should be approximately five minutes to keep package startup time under 5 minutes. RETRYTIME RETRY pkgA 5 seconds 60 attempts pkgB 7 seconds 43 attempts pkgC 9 seconds 33 attempts g. Uncomment the CLUSTERTYPE variable and set it to METRO. (The value CONTINENTAL is only for use with the Continentalclusters product, described in Chapter 5.) h. If using an M by N configuration, be sure that the variable CONSISTENCYGROUPS is set to 1 in the environment file CONSISTENCYGROUPS=1 9. Distribute Metrocluster with EMC SRDF configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes. Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Serviceguard package control script pkgname_srdf.env Metrocluster EMC SRDF environment file pkgname.ascii Serviceguard package ASCII configuration file Building a Metrocluster Solution with EMC SRDF 253 pkgname.sh Package monitor shell script, if applicable other files Any other scripts you use to manage Serviceguard packages The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster/SRDF 11. Check the configuration using the cmcheckconf -P package_name.ascii, then apply the Serviceguard configuration using the cmapplyconf -P package_name.ascii command or SAM. 12. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapplyfor an example of how to automate this task. The script must be customized with the Symmetrix device group names. Redirect the output of this script to a file for debugging purposes. Maintaining a Cluster that uses Metrocluster with EMC SRDF While the cluster is running, all EMC Symmetrix disk arrays that belong to the same Serviceguard package, and are defined in a single SRDF group must be in the same state at the same time. Manual changes of these states can cause the package to halt due to unexpected conditions. In general, it is recommended that no manual change of states should be performed while the package and the cluster are running. There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of Metrocluster EMC SRDF: 1. Stop the package with the appropriate Serviceguard command. # cmhaltpkg pkgname 2. Split the logical SRDF links for the package. # Samples/pre.cmquery 3. Distribute the Metrocluster EMC SRDF configuration changes. # cmapplyconf -P pkgconfig 4. Restore the logical SRDF links for the package. # Samples/post.cmapply 5. Start the package with the appropriate Serviceguard command. # cmmodpkg -e pkgname No checking of the status of the SA/FA ports is done. It is assumed that at least one PVLink is functional. Otherwise, the Volume Group activation will fail. Planned maintenance is treated the same as a failure by the cluster. If the node is taken down for maintenance, package failover and quorum calculation is based on the 254 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF remaining nodes. Make sure that the nodes are taken down evenly at each site, and enough nodes remain on-line to form a quorum if a failure occurs. For examples of failover scenarios, see section, “Example Failover Scenarios with Two Arbitrators” (page 31). Managing Business Continuity Volumes The use of Business Continuity Volumes is recommended with all implementations of Metrocluster EMC SRDF, and it is required with M by N configurations, which employ consistency groups. These BCV devices will provide a good copy of the data when it is necessary to recover from a rolling disaster—a second failure that occurs while attempting to recover from the first failure. Protecting against Rolling Disasters The following is an example of a rolling disaster with Metrocluster with EMC SRDF At time T0, all the SRDF links go down. The application continues to run on the R1 side. At time T1, the SRDF links are restored, and at T2 a manual resynchronization is started to resync new data from the R1 to the R2 side. At time T3, while resynchronization is in progress, the R1 site fails, and the application starts up on the R2 side. Since the resynchronization did not complete when there was a failure on the R1 side, the data on the R2 side is corrupt. Using the BCV in Resynchronization In the case described above, you use the business continuity volumes, which protect against a rolling disaster. First split off a consistent copy of the data at the recovery site, and then perform the re-synchronization. After the re-synchronization is complete, re-establish the BCV mirroring. To protect data consistency on R2 in rolling disaster, use the following procedures: 1. Before starting the re-synchronization from R1 to R2 side, it is necessary to disable the package switch capability to prevent the package automatically fail over to R2 if a new disaster occurs when the re-sync is still in progress. To disable the package switching on the R2 nodes. # cmmodpkg -d pkgname -n node_name 2. Split the BCV in the secondary Symmetrix from the mirror group to save a good copy of the data from nodes on R2 side. # symmir -g dgname split Alternatively, from node on R1 side. # symmir -g dgname split -rdf 3. Begin to resynchronize the data from R1 to R2 devices. # symrdf -g dgname est Building a Metrocluster Solution with EMC SRDF 255 4. After the resynchronization is completed, enable the package switching on the node on R2 side. # cmmodpkg -e pkgname -n node_name 5. Re-establish the BCV to R2 devices on R2 as a mirror. # symmir -g dgname -full est Alternatively, from node on R1 side. # symmir -g dgname -full est -rdf In Metrocluster with EMC SRDF environment, following the resynchronization process described above, which prevents the package from automatically failing over and starting on the R2 side if a disaster takes place when the resync is in progress. This ensures the package would not automatically start and operate on the inconsistent data in the event of a rolling disaster. As demonstrated above, the re-sync is a manual process and initiated by an operator after the links are fixed. The pairstate of the devices should be Synchronized for SRDF/Synchronous or Consistent for SRDF/Asynchronous when the re-sync is completed. Check the state and ensure that the re-sync is completed before enabling the package switch. If Metrocluster with EMC SRDF is used in Continentalclusters, it is not necessary to disable the package switch on the nodes on recovery site since each site has its own cluster. However, when the re-sync is in progress, make sure the recovery site will not start the recovery operation in the event of a disaster occurring on the primary site. Use the following procedures to protect data consistency on R2 in a Continentalclusters environment: 1. Split the BCV in the secondary Symmetrix from the mirror group to save a good copy of the data from nodes on R2 side: # symmir -g dgname split Alternatively, from node on R1 side. # symmir -g dgname split -rdf 2. Begin to resynchronize the data from R1 to R2 devices. # symrdf -g dgname est 3. Re-establish the BCV to R2 devices on R2 as a mirror. # symmir -g dgname -full est Alternatively, from node on R1 side. # symmir -g dgname -full est -rdf 256 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF R1/R2 Swapping This section describes how the R1/R2 swapping can be done via the Metrocluster SRDF package and manual procedures. Each of these methods allows swapping the SRDF personality for each device designation of a specified device group. In this situation, each source R1 device(s) becomes a target R2 device(s), and a target R1 device(s) becomes a source R1 device(s). R1/R2 Swapping using Metrocluster SRDF The Metrocluster SRDF package can be configured to automatically do R1/R2 swapping upon package failover. To enable R1/R2 swapping in the package, set the environment variable AUTOSWAPR2 in the _srdf.env file to 1 or 2. Since the swap is done automatically upon package start up, the Metrocluster SRDF software will only do the swap if the Symmetrix frames and the SRDF links between them are working properly, that is, the SRDF state of the device group is in Synchronized state. If the failover and swap operations succeed, the devices will have their personalities switched, and the data replication will continue from the new R1 devices to the new R2 devices. Prior to Metrocluster performing an R1/R2 swap, if the failover operation fails, the package will not be automatically started. If the failover operation succeeds, but R1/R2 swapping fails, then either the package is automatically started or fails depending on the value of the environment variable AUTOSWAPR2. The environment variable AUTOSWAPR2 can be set to either “1” or “2”. This will depend on whether the package needs to be started automatically on R2, in case of R1/R2 swap failure. If AUTOSWAPR2 is set to “1”, the package will fail to start if R1/R1 swapping fails. In this scenario it is necessary to start the package manually by doing the swap operation. If preferred, this can be done at a later time. If AUTOSWAPR2 is set to “2”, the package is automatically started regardless of a R1/R2 swap failure. In this scenario the data will not be protected remotely. NOTE: When failing over a package with R1/R2 swapping, the package startup time will be longer than without the swapping. R1/R2 Swapping using Manual Procedures It is also possible to do R1/R2 swapping manually. There are two scenarios where manual swapping is supported by Metrocluster with EMC SRDF. Scenario 1: In this scenario, the package failover is due to host failure or due to planned downtime maintenance. The SRDF links and the Symmetrix frames are still up and running. Because the package startup time will be longer if the swapping is done automatically, the user can choose not to have the swapping done by the package and then manually execute the swapping after the package is up and running on the R2 Building a Metrocluster Solution with EMC SRDF 257 side. Following is the manual procedure to swap the devices personalities and change the direction of the data replication. On the host that connects to the R2 side, use the following steps: 1. Swap the personalities of the devices and mark the old R1 devices to be refresh from the old R2 devices. # symrdf -g swap -refresh R1 2. After swapping is completed, the devices will be in Suspended state. Next establish the device group for data replication from the new R1 devices to the new R2 devices. # symrdf -g establish Scenario 2: In this scenario, two failures happen before the package fails over to the secondary data center. The SRDF link fails; the package continues to run and write data on R1 devices. Sometime later, the host fails; the package then fails over to the secondary data center. In this case, even if the AUTOSWAPR2 variable is set to 1 or 2, the package will not do the R1/R2 swapping, which happens after the host in the primary data center and the SRDF links are fixed. To minimize the application down time, instead of failing the application back to the primary data center, leave the application running in the secondary data center. Then manually swap the devices personalities and change the direction of the data replication. 1. Swap the personalities of the devices and mark the old R1 devices to be refresh from the old R2 devices. # symrdf -g swap -refresh R1 2. After swapping is completed, the devices will be in a suspended state. Next Establish the device groups for data replication from the new R1 devices to the new R2 devices. # symrdf -g establish CAUTION: R1/R2 Swapping cannot be used in an M by N Configuration. Some Further Points Following are listed some EMC Symmetrix specific requirements: • • 258 R1 and R2 devices have been correctly defined and assigned to the appropriate nodes in the internal configurations that is downloaded by EMC support staff. R1 devices are locally protected (RAID 1 or RAID S); R2 devices are locally protected (RAID 1, RAID S or BCV). Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF NOTE: It is highly recommended that the R2 device is locally protected with RAID 1 or RAID S. If the R2 device is protected with BCV, and if it fails and there is a failover, the package cannot operate on the BCV device. The R2 device has to be fixed, the data has to be restored from the BCV device to the new R2 device, before the package can start. • • Only synchronous and asynchronous modes are supported; adaptive copy must be disabled. Domino Mode enabled is required for M x N configuration to ensure the following: — data currency on all Symmetrix frames — there is no possibility of inconsistent data at the R2 side in case of SRDF links failure If Domino Mode is not enabled and all SRDF links fail, the new data is not replicated to the R2 side while the application continues to modify the data on the R1 side. This will result in the R2 side containing a copy of the data only up to the point of the Continuous Access link failure. If additional failure occurs, such as a system failure before the SRDF link is fixed, it will cause the application to fail over to the R2 side with only non-current data. If Domino Mode is not enabled, in the case of a rolling disaster, the data may be inconsistent. Additional failures may take place before the system has completely recovered from a previous failure. Inconsistent and therefore unusable data will result from the following sequence of circumstances: — — — — — — Domino Mode is not enabled the SRDF links fail the application continues to modify the data the link is restored resynchronization from R1 to R2 starts, but does not finish the R1 side fails Although the risk of this occurrence is extremely low, if the business cannot afford even a minor amount risk, then it is required to enable Domino Mode to ensure that the data at the R2 side are always consistent. The disadvantage of enabling Domino Mode is that when the SRDF link fails, all I/Os will be refused (to those devices) until the SRDF link is restored, or manual intervention is undertaken to disable Domino Mode. Applications may fail or may continuously retry the I/Os (depending on the application) if Domino Mode is enabled and the SRDF link fails. Some Further Points 259 NOTE: • • • • Domino Mode is not supported in asynchronous mode. SRDF firmware has been configured and hardware has been installed on both Symmetrix units. R1 and R2 devices must be correctly defined and assigned to the appropriate host systems in the internal configuration that is downloaded by EMC. While the cluster is running, all Symmetrix devices that belong to the same Serviceguard package, and defined in a single SRDF device group must be in the same state at the same time. Manual changes of these states can cause the package to halt due to unexpected conditions. In general, it is recommended that no manual change of states be performed while the package and the cluster are running. A single Symmetrix device group must be defined for each package on each host that is connected to the Symmetrix. The disk special device file names for all Volume Groups that belong to the package must be defined in one Symmetrix device group for both R1 side and R2 side. The Symmetrix device group name must be the same on each host for both R1 side and R2 side. This group name is placed variable DEVICE_GROUP defined in the pkg.env file. Although the name of the device group must be the same on each node, the special device file names specified may be different on each node. Symmetrix Logical Device names MUST be default names of the form “DEVnnn” (for example, DEV001). Do not use the option for creating your own device names. See the EMC Solutions Enabler manual, and the sample convenience scripts in the Samples directory included with this toolkit. • To minimize contention, each device group used in the package should be assigned two unique gatekeeper devices on the Symmetrix for each host where the package will run. These gatekeeper devices must be associated with the Symmetrix device groups for that package. The gatekeeper devices are typically a 2 MB logical device on the Symmetrix. For example, if a package is configured to failover across four nodes in the cluster, there should be eight gatekeeper devices (two for each node) that are assigned to the Symmetrix device group belonging to this package. It is required that there be a pool of four additional gatekeeper devices that are NOT associated with any device group. These gatekeepers would be available for other, non-cluster uses, for example, the Symmetrix Manager GUI and other EMC Solutions Enabler or SymAPI requests. After data configuration, each physical device in the Symmetrix has enough space remaining on it for gatekeeper purposes. 260 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF • • • • This toolkit does not support the HP OmniBack Integration with Symmetrix. The OmniBack Integration with Symmetrix may create certain states that will cause this package to halt if a failover occurs while the backup is in progress. No checking of the status of the SA/FA ports is done. It is assumed that at least one PVLink is functional. Otherwise, the VG activation will fail. This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the Symmetrix. Clusters with multiple packages that use devices on the Symmetrix will cause package startup time to increase when more than one package is starting at the same time. The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to account for the extra startup time due to getting status from the Symmetrix. See the previous paragraph for more information on the extra startup time. Metrocluster with SRDF/Asynchronous Data Replication The following sections presents concepts, functionality and requirements for configuring Metrocluster using SRDF/Asynchronous data replication. SRDF/Asynchronous delivers asynchronous data replication solutions featuring a consistent and restartable copy of the production data at the remote side. Metrocluster with EMC SRDF supports SRDF/Asynchronous to further enhance and protect critical business information. The topics discussed in this section are as follows: • • • • • Overview of SRDF/Asynchronous Concepts Requirements for using SRDF/Asynchronous in a Metrocluster Environment Preparing the Cluster for SRDF/Asynchronous Data Replication Building a Device Group for SRDF/Asynchronous Limitations and Restrictions Overview of SRDF/Asynchronous Concepts SRDF/Asynchronous provides a long-distance replication solutions with minimal impact on performance. This protection level is intended for customers requiring minimal host application impact, but need to maintain a restartable copy of data at R2 site. Data is transferred from R1 site to the R2 site in predefined timed cycles called delta sets, which eliminates the redundancy of same track changes being transferred over the link. In the event of a disaster at the R1 site or if SRDF links are lost during data transfer, a partial delta set of data is discarded. However, a dependent write consistent point-in-time copy of data is retained on the target side. Figure 5-12 depicts the SRDF/Asynchronous data sets. • At the R1 site, the capture cycle is collecting all new writes and tagging them as belonging to cycle N. There is also a transmit cycle (N-1) which is not receiving any new data, but is transferring the data it has collected when it was the active Metrocluster with SRDF/Asynchronous Data Replication 261 • cycle to the remote side. The capture cycle switches roles from capture to transmit during the cycle switch process and a new capture cycle is created. At the R2 site, there is a receive cycle (N-1), which is receiving data from the transmit cycle at R1. The apply cycle (N-2) at the remote site is marking all the tracks from a previous cycle as write-pending to the secondary devices (R2). The data is considered committed to the R2 side devices at cycle switch time. Figure 5-12 SRDF/Asynchronous Basic Functionality Host Host I/O SRDF/A Device pair Active session R1 R2 SRDF/A Delta set begins Capture Cycle N Transmit Cycle N-1 Capture new writes Cycle Transfer writes to R2 SRDF Links Writes apply To R2 device Apply Cycle N-2 Receive Writes R2 device Receive Cycle N-1 Requirements for using SRDF/Asynchronous in a Metrocluster Environment The following describes the hardware and software requirements for setting up SRDF/Asynchronous in a Metrocluster environment: 262 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Hardware Requirements • • EMC supports SRDF/Asynchronous on Symmetrix DMX Series only. The model numbers and the supported Enginuity level are available in EMC Symmetrix SRDF Product Guide. SRDF/Asynchronous supports all SRDF topologies including ESCON, point-to-point and switched fabrics. Refer to the EMC Network Storage Topology Guide for details and to plan the connectivity based on your distance requirements. Software Requirements • • • EMC Solutions Enabler (requires minimum version 6.0). Enginuity - Refer to the Disaster Tolerant Clusters Products Compatibility and Feature Matrix, for specific version information. An SRDF/Asynchronous license is required to access this functionality. For the most recent version and compatibility information, refer to the Disaster Tolerant Clusters Products Compatibility and Feature Matrix (Metrocluster/EMC SRDF – MC/SRDF). Preparing the Cluster for SRDF/Asynchronous Data Replication The following sections, “Metrocluster with SRDF/Asynchronous Data Replication”, and “Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous” describe architectures and configurations for preparing SRDF/Asynchronous data replication. Metrocluster SRDF Topology using SRDF/Asynchronous Figure 5-13 shows the recommended and supported disaster tolerant architecture in a Metropolitan cluster, using SRDF/Asynchronous data replication. The architecture consists of two main data centers and a third location with arbitrator nodes or quorum server nodes. Metrocluster with SRDF/Asynchronous Data Replication 263 Figure 5-13 Metrocluster Topology using SDRF/Asynchronous QS Ethernet Network NS NS NS NS NS Node A Node B Node C Node D D W D M FCS D W D M FCS DMX R1 Site FCS FCS FCS IP Network FCS FCS DMX FCS R2 Site Data replication can utilize any extended SAN devices that support SRDF Links, for example DWDM, Fiber Channel over Internet Protocol, etc. However, since the network for a Serviceguard cluster heartbeat requires a “Dark Fiber” link, it is recommended to utilize the DWDM links for SRDF/Asynchronous data replication. This will increase data replication bandwidth and reliability in the Metrocluster environment. Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous The following sections, “Building a Device Group for SRDF/Asynchronous”, and “Package Configuration using SRDF/Synchronous or SRDF/Asynchronous” describe the steps for building a device group and package configuration in an SRDF/Asynchronous environment. Building a Device Group for SRDF/Asynchronous To perform an operation on a device group for SRDF/Asynchronous data replication, the device group must be configured with all the devices that are SRDF/Asynchronous capable within the RDF group. Use the following steps to create a device group: 264 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 1. List SRDF/Asynchronous capable devices on the source Symmetrix unit and be sure the SRDF/Asynchronous capable devices are mapped to RDF group for use. In addition, all RDF devices that belong to the RDF (RA) group must be configured in one device group for SRDF/Asynchronous operation. For example, use the following command to display the devices from RDF (RA) group number 5: # symrdf -sid symid list -rdfa -rdfg 5 Symmetrix ID: 000187400684 Local Device View ------------------------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --- ---- ------------0196 0197 0198 0199 019A 019B 019C 019D 2. 0012 0013 0014 0015 0016 0017 0018 0019 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 R1:5 RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW S.. S.. S.. S.. S.. S.. S.. S.. 0 0 0 0 0 0 0 0 0 0 0 0 RW WD Synchronized RW WD Synchronized RW WD Synchronized RW WD Synchronized 0 RW WD Synchronized 0 RW WD Synchronized 0 RW WD Synchronized 0 RW WD Synchronized Create an RDF1 type device group. For example, the group name AsynDG. On R1 side: # symdg create AsynDG -type RDF1 On R2 side: # symdg create AsynDG -type RDF2 3. All devices from the RDF (RA) group configuration are added to the device group for SRDF/Asynchronous operation. For example, if the RDF group displayed in the symrdflist display is group number 5, then all devices in this RDF group must be managed together within one device group for SRDF/Asynchronous operation. # symld -g AsynDG addall -rdfg 5 4. 5. Repeat the steps 1-3 on each host that need to run Serviceguard packages. Query the device group to display the R1-to-R2 setup and the state of the SRDF/Asynchronous device pairs. # symld -g AsynDG query -rdfa Sample output from the command: Source (R1) View Target (R2) View MODES ------------------------------------------------------- ----- -----------ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE -------------------------------- -- ------------------------ ----- -----------DEV001 DEV002 DEV003 0196 RW 0197 RW 0198 RW 0 0 0 0 RW 0012 WD 0 RW 0013 WD 0 RW 0014 WD 0 0 0 0 S... 0 S... 0 S... Synchronized Synchronized Synchronized Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous 265 DEV004 DEV005 DEV006 DEV007 DEV008 6. 0199 019A 019B 019C 019D RW RW RW RW RW 0 0 0 0 0 0 0 0 0 RW 0015 WD RW 0016 WD RW 0017 WD RW 0018 WD 0 RW 0019 WD 0 0 0 0 0 0 0 0 0 S... Synchronized S... Synchronized S... Synchronized S... Synchronized 0 S... Synchronized Set the device group to Asynchronous mode: # symrdf -g AsynDG set mode async 7. Consistency protection must be enabled to ensure the data consistency on R2 side for the SRDF/Asynchronous devices in the device group. # symrdf -g AsynDG enable 8. If the SRDF pairs are not in a Consistent state at this point, initiate an establish command to synchronize the data on the R2 side from the R1 side. The device state will be SyncInProg until the Consistent status is reached. # symrdf -g AsynDG establish Sample output after the RDF pairs have been established: Source (R1) View Target (R2) View MODES ------------------------------------------- ----ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE ------------- -- ------------------------ ----- -----------DEV001 DEV002 DEV003 DEV004 DEV005 DEV006 DEV007 DEV007 0196 0197 0198 0199 019A 019B 019C 019D RW RW RW RW RW RW RW RW 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RW RW RW RW RW RW RW RW 0012 0013 0014 0015 0016 0017 0018 0019 WD WD WD WD WD WD WD WD 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A..X A..X A..X A..X A..X A..X A..X A..X Consistent Consistent Consistent Consistent Consistent Consistent Consistent Consistent Package Configuration using SRDF/Synchronous or SRDF/Asynchronous The following describes configuring a package using SRDF/Synchronous or SRDF/Asynchronous MSC for first-time installation or pre-existing installations. First-time installation of Metrocluster with EMC SRDF using SRDF/Synchronous If this is a first-time installation of Metrocluster with EMC SRDF do the following steps: 1. 2. 266 Copy the template file that is shipped with the Metrocluster with EMC SRDF product from /opt/cmcluster/toolkit/SGSRDF/srdf.env to the package directory. Customize the template file based on the requirements in your environment. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Pre-existing Installations of Metrocluster SRDF using SRDF/Synchronous If there is a pre-existing installation of Metrocluster with EMC SRDF and the Serviceguard applications using SRDF/Synchronous data replication, make the following changes: • Set new variable RDF_MODE to sync RDF_MODE=sync If RDF_MODE is not present, the synchronous mode is assumed. • Unset the SYMCLI_MODE environment variable if it has been set previously. Migration of Existing Applications from SRDF/Synchronous to SRDF/Asynchronous EMC does not support migration on the existing applications from SRDF/Synchronous to SRDF/Asynchronous data replication. Applications that needs to use SRDF/Asynchronous mode for data replication should configure a new device group in SRDF/Asynchronous mode. The need to alter the data replication mode between synchronous and asynchronous for an application is not expected in a typical Disaster Tolerant environment. Contact EMC and HP for specific requirement to your environment. Package Failover using SRDF/Asynchronous The EMC Solutions Enabler provides a control operation checkpoint to confirm that the data written in the current SRDF/Asynchronous cycle has been successfully committed to the R2 side. When a package fails over to secondary site, Metrocluster with EMC SRDF ensures the most current data when the SRDF link is still up. Metrocluster with EMC SRDF will invoke the action checkpoint prior to fail over to the storage. Since the checkpoint operation will prolong the failover time for a package to start, the duration for a package to start on R2 side will be longer. The time taken to complete the checkpoint operation depends on the cycle time configured which determines the amount of data outstanding on the R1 site. Protecting against a Rolling Disaster It is recommended to use the procedure described in the previous section for protecting against rolling disaster situation. Limitations and Restrictions • • • Consistency Group with SRDF/Asynchronous mode is not supported. Domino mode is not supported in SRDF/Asynchronous mode. Metrocluster with ECM SRDF does not support cascading configuration using SRDF/Asynchronous. Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous 267 Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication The following sections present the concepts, functionality and requirements for configuring Metrocluster using SRDF/Asynchronous (SRDF/A) Multi-Session Consistency (MSC) Data Replication. The topics discussed in this section are as follows: • • • • Overview SRDF/Asynchronous MSC Concepts Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Multi Session Consistency (MSC) Data Replication Building a Composite Group for SRDF/Asynchronous MSC Package Configuration using SRDF/Synchronous or SRDF/Asynchronous Overview of SRDF/Asynchronous MSC Concepts When a database is spread across multiple Symmetrix arrays and SRDF/A is used for long distance replication, separate software must be used to manage the coordination of the delta set boundaries between the participating Symmetrix arrays or RDF groups and to stop replication if any of the volumes in a Symmetrix array or RDF group cannot replicate for any reason. The software must ensure that all delta set boundaries on every participating Symmetrix array in the configuration are coordinated to give a dependent write consistent point-in-time image of the data. RDF-Multi Session Consistency (RDF-MSC) is the new technology that provides consistency across either multiple RDF groups or multiple Symmetrix arrays. RDF-MSC is supported by an SRDF process daemon that performs cycle switching and cache recovery operations across all SRDF/A sessions in the group. This ensures that a dependent write consistent R2 copy of the data exists at the remote site at all the times. From a single Symmetrix array perspective, the I/O is processed exactly the same way in SRDF/CG multi-session mode for SRDF/A, as in a single-session mode. Following is the sequence of tasks that are completed while processing an I/O: 1. 2. 3. The host writes to cycle N (capture cycle) on the R1 side. SRDF/A transfers cycle N-1 (transmit cycle) from R1 to R2. The receive cycle N-1 on the R2 side receives data from the transmit cycle, and the apply cycle N-2 restores data to the R2 devices. During this process, the status and location of the active and inactive cycles are communicated between the R1 and R2 Symmetrix systems. For example, when R1 finishes sending cycle N–1, it sends a special indication to R2 to let it know that it has completed the inactive cycle transfer. Similarly, when R2 finishes restoring cycle N – 2, it sends a special indication to let R1 know that its active cycle is empty. At this point in the process, (that is, when the transmit cycle on the R1 side is empty and the apply cycle on the R2 side is empty), SRDF/A is ready for a cycle switch. 268 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF In a single-session mode, once all the conditions are satisfied, an SRDF/A cycle switch is performed. However, when using SRDF/A MSC, it indicates switch readiness of the array or the RDF group to the host which is polling for this condition. The cycle switch command is issued by the host only when all Symmetrix systems or SDRF groups indicate their switch readiness. Hence, the cycle switch is coordinated across multiple Symmetrix systems or RDF groups. SRDF/A enters a multi-session mode after receiving a command from the host. As part of the command to enter the multi-session mode, and with each subsequent switch command issued, the host provides a tag to each capture cycle that is retained throughout that cycle life. This cycle tag is a value that is common across all participating SRDF/A sessions and eliminates the need to synchronize the cycle numbers across them. The cycle tag is the mechanism by which consistency is assured. Multi-session SRDF/A performs a coordinated cycle switch during a very short window of time when there are no host writes being completed. This time period is referred to as an SRDF/A window. When the host discovers that all systems are ready for a cycle switch, it issues a single command to each Symmetrix system that performs a cycle switch to open the SRDF/A window. When the window is open, any I/Os that start will be disconnected, and as a result no dependent I/Os will even be issued by any host to any devices in the multi-session group. The SRDF/A window remains open on each Symmetrix system until the last Symmetrix system in the multi-session group acknowledges to the host that the switch and open command has been processed and a close command has been received. As a result, a dependent write consistency across the SRDF/A multi-session group is created, and once the SRDF/A window has been opened and the cycle has successfully switched on all Symmetrix systems, the SRDF/A window can then be closed by the SRDF/A MSC software, allowing all disconnected writes to complete and normal processing to resume. As part of this switch and open operation, SRDF/A MSC assigns a cycle tag value to the active cycle. This cycle tag value is separate from the cycle number assigned internally by SRDF/A. This cycle tag is carried by the SRDF/A process to the remote side and is used by the host at the recovery site to ensure that only data from the same host cycle is applied to the R2 devices in each Symmetrix system in the event of a disaster. Once all Symmetrix systems have completed a cycle switch, the host issues a command to close the window (turn off the bit in the state table), and all disconnected write I/Os complete. During this window, read I/Os complete normally to any devices or PAV aliases that have not received a write. The SRDF/A window is an attribute of the SRDF/A group and is checked at the start of each I/O, at no additional overhead, because the host adapter is already obtaining the cycle number from global memory as part of SDRF/A’s existing minimal overhead. The RDF daemon is responsible for coordinating cycle switch between different SRDF/A session in the consistency group so that data Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication 269 is consistent. SRDF/A MSC supports RDF daemon to be enabled on a single host or on multiple hosts. The RDF daemon is responsible for coordinating cycle switch between different SRDF/A session in the consistency group so that data is consistent. SRDF/A MSC supports RDF daemon to be enabled on a single host or on multiple hosts. It is recommended that you enable the daemon on multiple hosts. Figure 5-14 Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication Host Primary Symmetrix Arrays Host Secondary Symmetrix Arrays I/O Symmetrix A Symmetrix B Symmetrix C Symmetrix D Apply cycle Apply cycle N-2 N-2 SRDF/A Delta set Capture Capture Cycle N Cycle N Transmit Writes apply to R2 device Transfer writes to R2 Cycle N-1 Transmit Cycle N-1 Transfer writes to R2 SRDF Links Receive cycle Receive Writes R2 device N-1 Receive cycle Receive Writes R2 device N-1 Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Multi-Session Consistency (MSC) Data Replication The following sections describe the steps for building a composite group and package configuration in a SRDF/Asynchronous MSC environment. Following are the topics discussed in this chapter: • • • 270 Building a Composite Group for SRDF/Asynchronous MSC Configuring a package using SRDF/Asynchronous MSC Setting up the RDF Daemon Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Composite Group for SRDF/Asynchronous MSC To perform an operation on a composite group for SRDF/Asynchronous MSC data replication, the composite group must be configured with devices that are SRDF/Asynchronous capable within the RDF group. Use the following steps to create a composite group. 1. List SRDF/Asynchronous capable devices on the source Symmetrix unit and be sure the SRDF/Asynchronous capable devices are mapped to RDF group for use For example, use the following command to display the devices from RDF (RA) group number 5: # symrdf -sid symid list -rdfa -rdfg 6 Symmetrix ID: 000187400684 Local Device View -------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv --------------------Dev RDev Typ:G SA RA LNK MDA Tracks Tracks Dev RDev Pair ---- ---- ------ --------- ----- ------- ------- --0196 0012 R1:5 RW RW RW S.. 0 0 RW WD Synchronized 0197 0013 R1:5 RW RW RW S.. 0 0 RW WD Synchronized 0198 0014 R1:5 RW RW RW S.. 0 0 RW WD Synchronized 2. Create a composite group for MSC. For example, the group name MSCcg. On R1 site, run the following command: # symcg create MSCcg -type RDF1 –rdf_consistency On R2 site, run the following command: # symcg create MSCcg -type RDF2 –rdf_consistency 3. All devices from RDF (RA) groups configuration are added to the composite group for SRDF/Asynchronous MSC operation. For example, the RDF groups 6 and 7 are added to the composite group MSCcg for SRDF/Asynchronous MSC operation. # symcg -cg MSCcg -rdfg 6 addall pd # symcg -cg MSCcg –rdfg 7 addall pd 4. 5. Repeat the steps 1-3 on each host that needs to run Serviceguard packages. Query the composite group MSCcg to display the R1-to-R2 setup and the state of the SRDF/Asynchronous device pairs. # symrdf -cg MSCcg query -rdfa Following is a sample output of the command: Source (R1) View Target (R2) View MODES ------------------------------------- ----- -------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE ------------------------ -- ------------------ ----- ----------- Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication 271 DEV001 0196 RW 0 0 RW 0012 WD 0 0 S... Synchronized DEV002 0197 RW 0 0 RW 0013 WD 0 0 S... Synchronized DEV003 0198 RW 0 0 RW 0014 WD 0 0 S... Synchronized Source (R1) View Target (R2) View MODES ----------------------- ------------------ ----- ---------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv Dev E Tracks Tracks S Dev E Tracks Tracks MDAC ------------------- -- ---------------- ----- -----------DEV001 01B6 RW 0 0 RW 0326 WD 0 0 S... DEV002 01B7 RW 0 0 RW 0327 WD 0 0 S... DEV003 01B8 RW 0 0 RW 0328 WD 0 0 S... DEV004 01B9 RW 0 0 RW 0329 WD 0 0 S... DEV005 01BA RW 0 0 RW 032A WD 0 0 S... DEV006 01BB RW 0 0 RW 032B WD 0 0 S... DEV007 01BC RW 0 0 RW 032C WD 0 0 S... DEV008 01BD RW 0 0 RW 032D WD 0 0 S... 6. RDF PairDevice STATE Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Synchronized Set the composite group to Asynchronous mode. # symrdf -cg MSCcg set mode async 7. Enable consistency protection to ensure data consistency on R2 side for the SRDF/Asynchronous devices in the composite group. # symrdf -cg MSCcg enable 8. If the SRDF pairs are not in a consistent state at this point, initiate an establish command to synchronize the data on the R2 side from the R1 side. The device state will beSyncInProg until the consistent status is reached. # symrdf -cg MSCcg establish # symrdf -cg MSCcg query -rdfa RDFA MSC Info { MSC Session Status : Active Consistency State : CONSISTENT } Source (R1) View Target (R2) View MODES ------------------ --------------------- ----- -----------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE -------------- -- ----------------------- ----- -----------DEV009 0196 RW 0 0 RW 005A WD 0 0 A... DEV010 0197 RW 0 0 RW 005B WD 0 0 A... DEV011 0198 RW 0 0 RW 005C WD 0 0 A... Source (R1) View Target (R2) View MODES ----------------------------------- ----- -----------ST LI ST Standard A N A Logical Sym T R1 Inv R2 Inv K T R1 Inv R2 Inv Dev E Tracks Tracks S Dev E Tracks Tracks MDAC STATE -------------- -- ----------------------- ----- -----------DEV001 01B6 RW 0 0 RW 0326 WD 0 0 A..X DEV002 01B7 RW 0 0 RW 0327 WD 0 0 A..X DEV003 01B8 RW 0 0 RW 0328 WD 0 0 A..X DEV004 01B9 RW 0 0 RW 0329 WD 0 0 A..X DEV005 01BA RW 0 0 RW 032A WD 0 0 A..X DEV006 01BB RW 0 0 RW 032B WD 0 0 A..X 272 RDF PairDevice Consistent Consistent Consistent RDF PairDevice Consistent Consistent Consistent Consistent Consistent Consistent Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF DEV007 DEV008 01BC RW 01BD RW 0 0 0 RW 032C WD 0 RW 032D WD 0 0 0 A..X 0 A..X Consistent Consistent Configuring a Package using SRDF/Asynchronous MSC The following section describes configuring a package using SRDF/Asynchronous MSC for first-time installation or pre-existing installations. Initial installation of Metrocluster with EMC SRDF using SRDF/Synchronous If you are installing Metrocluster with EMC SRDF using SRDF/Synchronous for the first time, complete the following procedure: 1. 2. Copy the template file from /opt/cmcluster/toolkit/SGSRDF/srdf.env to the package directory. Change the value of the RDF_MODE variable to Asynchronous. RDF_MODE=async 3. Change the value of the CONSISTENCYGROUPS variable to 1. CONSISTENCYGROUPS = 1 Metrocluster with EMC SRDF is already installed If Metrocluster SRDF is already installed and the Serviceguard applications use SRDF/Asynchronous data replication, you must make the following changes in the srdf.env file: 1. Change the value of the CONSISTENCYGROUPS variable to 1. CONSISTENCYGROUPS = 1 2. Clear the value set for the SYMCLI_MODE environment variable, if set previously. Setting up the RDF Daemon The cycle switch process required for SRDF/A MSC is provided by the Solutions Enabler software executing a RDF daemon that implements the MSC functionality. You can enable or disable this RDF daemon on each host using the SYMAPI_USE_RDFD option in the SYMAPI file. The use of this RDF daemon is enabled or disabled on each host using the SYMAPI options file. The default value of the SYMAPI_USE_RDFD option is Disable. To enable the RDF daemon, make the following changes: #cd /var/symapi/config# vi optionsSYMAPI_USE_RDFD = ENABLE Setting this option to ENABLE activates the RDF daemon for SRDF/A MSC. It is recommended that you enable the daemon on multiple hosts. Metrocluster with SRDF/Asynchronous Multi-Session Consistency Data Replication 273 Starting and Stopping the Daemon There are multiple ways in which you can start the RDF daemon. You must start the daemon on all nodes in the cluster. If you have enabled the RDF daemon, Solutions Enabler software automatically starts the daemon. Alternatively, you can manually start the daemon using the stordaemon command: stordaemon start storrdfd [-wait Seconds] By default, the stordaemon command waits 30 seconds to verify that the daemon is running. To override this default value, use the -wait option. In addition, you can set the daemon to start automatically every time the local host is started. You can set this condition for the daemon using the following command: stordaemon install storrdfd -autostart To stop the RDF daemon, run the following command: stordaemon stop storrdfd [-wait Seconds] Building a Continental Cluster Solution with EMC SRDF The following section describes how to configure a continental cluster solution using EMC SRDF, which requires the Metrocluster with EMC SRDF product. Setting up a Primary Package on the Primary Cluster Use the procedures in this section to configure a primary package on the primary cluster. Consult the Managing Serviceguard user’s guide for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. 1. 2. 3. If this was not done previously, split the EMC SRDF logical links for the disks associated with the application package. See the script, Samples/pre.cmquery (edit to the SRDF groups configured) for an example of how to automate this task. The script must be customized with the Symmetrix device group names. Create and test a standard Serviceguard cluster using the procedures described in the Managing Serviceguard user’s guide. Install Continentalclusters on all the cluster nodes in the primary cluster (Skip this step if the software has been pre installed) NOTE: Serviceguard should already be installed on all the cluster nodes. Run swinstall(1m)to install Continentalclusters and Metrocluster with EMC SRDF products from an SD depot. 4. When swinstall(1m) has completed, create a directory as follows for the new package in the primary cluster. # mkdir /etc/cmcluster/ 274 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 5. Copy the environment file template /opt/cmcluster/toolkit/SRDF/ srdf.env to the package directory, naming it pkgname_srdf.env: # cp /opt/cmcluster/toolkit/SGSRDF/srdf.env \ /etc/cmcluster/pkgname/pkgname_srdf.env 6. Create an Serviceguard Application package configuration file. # cd /etc/cmcluster/ # cmmakepkg -p .conf Customize it as appropriate to your application. Be sure to include Node names, and the path name of the control script (/etc/cmcluster// .cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Also change AUTO_RUN (PKG_SWITCHING_ENABLED in Serviceguard A.11.09) to NO. This will ensure that the application packages will not start automatically. (the ccmonpkg will be set to yes) Define the service (as required) 7. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater. 8. 9. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. Edit the environment file _srdf.env as follows: a. Add the path where the EMC Solutions Enabler software binaries have been installed to the PATH environment variable. The default location is /usr/ symcli/bin. b. Uncomment AUTO*environment variables. It is recommended to retain the default values of these variables unless there is a specific business requirement to change them. See Appendix B for an explanation of these variables. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to /etc/cmcluster/. d. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device group names given in the ’symdg list’ command. The DEVICE_GROUP variable may also contain the consistency group name if using a M by N configuration. e. Uncomment the RETRY and RETRYTIME variables. The defaults should be used for the first package. The values should be slightly different for other packages. Building a Continental Cluster Solution with EMC SRDF 275 RETRYTIME should increase by two seconds for each package. The product of RETRY * RETRYTIME should be approximately five minutes. These variables are used to decide how often and how many times to retry the Symmetrix status commands. For example, if there are three packages with data on a particular Symmetrix pair (connected by SRDF), then the values for RETRY and RETRYTIME might be as follows: Table 5-3 RETRY and RETRYTIME Values RETRYTIME RETRY pkgA 60 5 pkgB 43 7 pkgC 33 9 f. Uncomment the CLUSTER_TYPE variable and set it to “continental”. g. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application. 10. Edit the remaining control script variables (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application as it runs on the primary cluster. See the Managing Serviceguard manual for more information on these variables. 11. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Serviceguard manual for more information on these functions. 12. Distribute EMC SRDF package configuration, environment, and control script files to other nodes in the primary cluster by using ftp or rcp. # rcp -p /etc/cmcluster//.cntl \ other_node:/etc/cmcluster//.cntl When using ftp, be sure to make the file executable on any destination systems. 13. Verify that each host in both clusters in the continental cluster has the following files in the directory /etc/cmcluster/: • • • • .cntl (EMC SRDF package control script) .conf (Serviceguard package ASCII config file) .sh (Package monitor shell script, if applicable) _srdf.env (Metrocluster EMC SRDF environment file) 14. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmqueryfor an example of how to automate this task. The script must be customized with the Symmetrix device group names. 276 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 15. Apply the Serviceguard configuration using the cmapplyconf command or SAM. 16. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and failover. 17. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapply (after recovery cluster is completed in next section) for an example of how to automate this task. The script must be customized with the Symmetrix device group names. The primary cluster is now ready for the Continentalclusters operation. Setting up a Recovery Package on the Recovery Cluster The installation of EMC SRDF, Serviceguard, and Continentalclusters software is exactly the same as in the previous section. The procedures below will install and configure a recovery package on the recovery cluster. Consult the Managing Serviceguard user’s guide for instructions on setting up a Serviceguard cluster (that is, LAN, VG, LV,...etc). 1. 2. Split the EMC SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be edited to refer to the SRDF groups configured and customized with the Symmetrix device group names. Generate a cluster ASCII file. # cmquerycl -n node1 -n node2 -C CClusterNY.ascii Edit the file CClusterNY.ascii. Be sure to select a primary cluster lock disk that is not a lock disk on the recovery cluster. Edits include spreading HEARTBEAT_IP on all user LANs, and setting MAX_PACKAGES. 3. Check the configuration. # cmcheckconf -C CClusterNY.ascii 4. Create the cluster binary. # cmapplyconf -C CClusterNY.ascii 5. Test the cluster. # cmruncl -v # cmviewcl -v Does the cluster come up? If so, then stop the cluster: # cmhaltcl -f 6. Copy the package files from the primary cluster to a bkpkgXXX directory, and rename it to .cntl and _srdf.env. Building a Continental Cluster Solution with EMC SRDF 277 Edit the recovery package control file from the primary cluster for the secondary cluster. Change the subnet, relocatable IP, and nodes. Be sure to set AUTO_RUN to NO in the package ASCII file. 7. Edit the recovery package environment file _srdf.env as follows: a. Add the path for EMC Solutions Enabler software binaries. b. Make sure that all AUTO* variables are uncommented. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to /etc/cmcluster/. d. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device group names given in the symdg list command. The DEVICE_GROUP variable may also contain the consistency group name if using a M by N configuration. e. Uncomment the RETRY and RETRYTIME variables. f. Make sure the CLUSTER_TYPE variable is set to “continental”. g. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application. 8. Edit the remaining application package control script variables in the package control script (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these variables. Change the Subnet IP from ftp copy. Verify that each host in both clusters in the continental cluster has the following files in the directory /etc/cmcluster/: 9. .cntl (continental cluster package control script) .conf (Serviceguard package ASCII config file) .sh (Package monitor shell script, if applicable) _srdf.env (Metrocluster SRDF environment file) 10. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names. 11. Apply the Serviceguard configuration using the cmapplyconf command or SAM for the recovery cluster. 12. Test the cluster and packages. # cmruncl # cmmodpkg -e bkpkgCCA # cmviewcl -v 278 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Note that cmmodpkg is used to manually start the application packages. Do all application packages start? If so, then issue the following command. # cmhaltcl -f NOTE: Application packages cannot run on R1 and R2 at the same time. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted to prevent data corruption. 13. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapply for an example of how to automate this task. The script must be customized with the Symmetrix device group names. The recovery cluster is now ready for continental cluster operation. Setting up the Continental Cluster Configuration The procedures below will configure Continentalclusters and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2: “Designing a Continental Cluster”. 1. 2. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names. Generate the Continentalclusters configuration using the following command: # cmqueryconcl -C cmconcl.config 3. Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. Note that when data replication is done using EMC SRDF, there are no data sender and receiver packages. Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step. 4. 5. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/ scripts to /etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The monitor package should start automatically when the cluster is formed. Building a Continental Cluster Solution with EMC SRDF 279 6. Apply the monitor package to both cluster configurations. # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config 7. Restore the logical SRDF links for the package. See the script Samples/ post.cmapplyfor an example of how to automate this task. The script must be customized with the appropriate Symmetrix device group names. Example: # Samples/post.cmapply 8. Generate the cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/ cmclconfig nor is there an equivalent file for Continentalclusters. Example: # cmapplyconcl -C cmconcl.config 9. Start the monitor package on both clusters. The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status. 10. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. 11. Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster. 12. View the status of the continental cluster primary and recovery clusters, including configured event data. # cmviewconcl -v The continental cluster is now ready for testing. See Chapter 2: “Designing a Continental Cluster”, section “Testing the Continental Cluster” (page 91). Switching to the Recovery Cluster in Case of Disaster It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following: • • • During an alert, the cmrecovercl will not start the recovery packages unless the -foption is used. During an alarm, the cmrecovercl will start the recovery packages without the -foption. When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up. Verify SRDF links are Up. 280 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF # symrdf list Failback Scenarios There is no failback counterpart to the “pushbutton” failover from the primary cluster to the recovery cluster. Failback is dependent on the original nature of the failover, the state of primary and secondary Symmetrix SRDF volumes (R1 and R2) and the condition of the primary cluster. In Chapter 2: “Designing a Continental Cluster”, there is a discussion of failback mechanisms and methodologies in the section “Restoring Disaster Tolerance” (page 99). The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following discussion addresses some of those failures and suggests recovery approaches applicable to the environments using data replication provided by Symmetrix Disk Arrays and Symmetrix Remote Data Facility SRDF. Scenario 1 The primary site has lost power, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the Symmetrix or the operating systems of the systems at the primary site. After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the status of the R1 and R2 paired group volumes. The command symrdflist will display status of the device group. Source (R1) View Target (R2) View MODES ------------------------------------------------------- ----- -----------ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE -------------------------------- -- ------------------------ ----- -----------DEV001 DEV002 009F WD 00A0 WD 0 0 0 NR 00A5 RW 0 NR 00A6 RW 0 0 0 S.. 0 S.. Failed Over Failed Over After power is restored to the primary site, the Symmetrix device groups may be in the status of Failed Over. The procedure to move the application packages back to the primary site are different depending on the status of the device groups. The following procedure applies to the situation where the device groups have a status of “Failed Over”: 1. Halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg Building a Continental Cluster Solution with EMC SRDF 281 This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the device groups will remain “Synchronized” at the recovery site and “Failed Over” at the primary site. 2. 3. 4. Halt the cluster, which also halts the monitor package ccmonpkg. Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically. Manually start the Continentalclusters primary packages at the primary site. # cmrunpkg or # cmmodpkg -e The control script is programmed to handle this case. The control script will issue an SRDF failback command to move the device group back to the R1 side and to resynchronize the R1 from the R2 side. Until the resynchronization is complete, the SRDF “read-through” feature will ensure that any reads on the R1 side will be current, by reading data through the SRDF link from the R2 side. NOTE: If the system administrator does not want synchronization performed from the remote (recovery) site, the device groups should be split and recreated manually. 5. 6. Ensure that the monitor packages at the primary and recovery sites are running. Verify device group is synchronized. # symrdf list 7. Manually bring the package back if the package does not come up, and the device group status is “failed over.” # symrdf -g pkgCCB_r1 failback Execute an RDF ’Failback’ operation for device group ’pkgCCB_r1’ (y/[n]) ? y An RDF ’Failback’ operation execution is in progress for device group ’pkgCCB_r1’. Please wait... Write Disable device(s) on RA at target (R2)..............Done. Suspend RDF link(s).......................................Done. Merge device track tables between source and target.......Started. Device: 001 ............................................. Merged. Merge device track tables between source and target.......Done. Resume RDF link(s)........................................Done. Read/Write Enable device(s) on SA at source (R1)..........Done. The RDF ’Failback’ operation successfully executed for device group ’pkgCCB_r1’. 282 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 8. During the resync; the status goes from failed over > invalid > SyncInProg. Example: ftsys1a# symrdf list Symmetrix ID: 000183500021 Local Device View -----------------------------------------------------------------------STATUS M O D E S RDF S T A T E S Sym RDF --------- ------------ R1 Inv R2 Inv----------------------Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair --- ---- ----- --------- ------------ ----------- --- ----------------000 001 000 001 R2:2 RW WD RW R2:2 RW WD RW SYN SYN DIS OFF DIS OFF 0 12 0 WD 0 WD RW WD Synchronized Invalid ftsys1a# symrdf list Symmetrix ID: 000183500021 Local Device View ---------------------------------------------------------------------------STATUS M O D E S RDF S T A T E S Sym RDF --------- ------------ R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair --- ---- ----- --------- ------------ ----------- --- ---------------000 001 9. 000 001 R2:2 RW WD RW R2:2 RW WD RW SYN SYN DIS OFF DIS OFF 0 2 0 WD 0 WD RW RW Synchronized SyncInProg Halt the recovery cluster and restart it. # cmhaltcl -f (if the cluster is not already down) # cmruncl 10. Verify the data for data consistency and currency. Scenario 2 The primary site Symmetrix experienced a catastrophic hardware failure and all data was lost on the array. After the reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the status of the Symmetrix SRDF paired volumes. Since the systems at the primary site are accessible, but the Symmetrix is not, the control file will evaluate the paired volumes with a local status of “failed over”. The control file script is programmed to handle this condition and will enable the volume groups, mount the logical volumes, assign floating IP addresses and start any processes as coded into the script. After the primary site Symmetrix is repaired and configured, use the following procedure to move the application package back to the primary site. Building a Continental Cluster Solution with EMC SRDF 283 1. Manually create the Symmetrix device groups and gatekeeper configurations device groups. Re-run the scripts mk3symgrps* and mk4gatekpr* which do the following: # date >ftsys1.group.list # symdg create -type RDF1 pkgCCA_r1 # symld -g pkgCCA_r1 add pd /dev/rdsk/c7t0d0 # symgate define pd /dev/rdsk/c7t15d0 # symgate define pd /dev/rdsk/c7t15d1 # symgate -g pkgCCA_r1 associate pd /dev/rdsk/c7t15d0 2. Halt the Continentalclusters recovery packages at the recovery site. # cmhaltpkg This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the paired volumes will be SPLIT at both the recovery and primary sites. 3. 4. 5. Halt the Cluster, which also halts the monitor package ccmonpkg. Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically. Since the paired volumes have a status of SPLIT at both the primary and recovery sites, the EMC views the two halves as unmirrored. Issue the following command: # symrdf -g pkgCCB_r1 failback Since the most current data will be at the remote or recovery site, this command to synchronize from the remote site). Wait for the synchronization process to complete before progressing to the next step. Failure to wait for the synchronization to complete will result in the package failing to start in the next step. 6. Manually start the Continentalclusters primary packages at the primary site using # cmrunpkg The control script is programmed to handle this case. The control script recognizes the paired volume is synchronized and will proceed with the programmed package startup. 7. Verify the device group is synchronized. # symrdf list 8. 284 Ensure that the monitor packages at the primary and recovery sites are running. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Maintaining the EMC SRDF Data Replication Environment Normal Startup The following is the normal Continentalclusters startup procedure. On the primary cluster: 1. Start the primary cluster. # cmruncl -v The primary cluster comes up with ccmonpkg up. The application packages are down, and ccmonpkg is up. 2. Manually start application packages on the primary cluster. # cmmodpkg -e 3. Confirm primary cluster status. # cmviewcl -v and # cmviewconcl -v 4. Verify SRDF Links. # symrdf list On the recovery cluster, do the following: 1. Start the recovery cluster. # cmruncl -v The recovery cluster comes up with ccmonpkg up. The application packages (bkpkgX) stay down, and ccmonpkg is up. 2. 3. Do not manually start application packages on the recovery cluster; this will cause data corruption. Confirm recovery cluster status. # cmviewcl -v and # cmviewconcl -v Normal Maintenance There might be situations where a package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Continentalclusters with EMC SRDF data replication: Building a Continental Cluster Solution with EMC SRDF 285 1. Shut down the package with the appropriate command. Example: # cmhaltpkg 2. Distribute the package configuration changes. Example: # cmapplyconf - P (Primary cluster) # cmapplyconf -P (Recovery cluster) 3. Start up the package with the appropriate Serviceguard command. Example: # cmmodpkg -e (Primary cluster) CAUTION: Never enable package switching on both the primary package and the recovery package. 4. Halt the monitor package. # cmhaltpkg ccmonpkg 5. To apply the new continental cluster configuration. # cmapplyconcl -C 6. Restart the monitor package. # cmrunpkg ccmonpkg 286 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 6 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture This chapter describes Three Data Center architecture through the following topics: • • • • • • • Overview of Three Data Center Concepts Overview of HP XP StorageWorks Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Configuring an XP Three Data Center Solution HP StorageWorks RAID Manager Configuration Package Configuration in a Three Data Center Environment Failback Scenarios NOTE: For additional information, refer to the Release Notes for your metropolitan and continental cluster products and the documentation for your storage solution. Overview of Three Data Center Concepts A Three Data Center solution integrates Serviceguard, Metrocluster Continuous Access XP, Continentalclusters and HP StorageWorks XP 3DC Data Replication Architecture. This configuration provides high availability in a disaster tolerant solution by using the data consistency of synchronous replication and the long distance capability of Continuous Access journaling replication to protect against local and wide-area disasters. A Three Data Center configuration of consists of two Serviceguard clusters. The first cluster, which is a Metrocluster, has two data centers that make up both the Primary data center (DC1) and Secondary data center (DC2). The second cluster, typically located at a long distance from the Metrocluster sites, is the Third Data Center (DC3). These two clusters are configured in a Continentalclusters environment, which is made up of a Metrocluster configured as a primary cluster and the third data center (DC3) cluster is configured as a recovery cluster as shown in Figure 6-1. A Three Data Center configuration uses HP StorageWorks 3DC Data Replication Architecture in order to replicate data over three data centers. Three Data Center provides complete data currency and protects against both local and wide-area disasters. Also, three Data Center concurrently supports short-distance Continuous Access synchronous replication within the Metrocluster, and long-distance Continuous Access journal replication between the Metrocluster and recovery cluster. Figure 6-1 depicts a Three Data Center solution in which DC1 and DC2 are physically configured as a Metrocluster and DC3 is an independent Serviceguard cluster. The entire environment is configured as a Continentalclusters solution. Within the Metrocluster, packages can failover and failback automatically, but the recovery cluster Overview of Three Data Center Concepts 287 DC3 only supports a semi-automatic package failover. Failing back a package from DC3 to DC1 or DC2 is done manually. See “Failback Scenarios” (page 318) for more information on the failback process. Figure 6-1 Three Data Center Solution Overview MetroCluster Region Single SG Cluster SG Heartbeat and CA-Sync link VOL DWDM XP12000 DC1 VOL DWDM VOL JNL CAJNL Link XP12000 DC2 ICAP Converter JNL FC/IP Converter XP12000 DC3 Continental Region The Three Data Center solution provides the following benefits: • • • • 288 Maintains high performance. Using synchronous replication over a short distance in a Metrocluster environment provides the highest level of data currency and application availability without significant impact to application performance. Allows swift recovery. Metrocluster implementation allows for fast automated failovers after a local area disaster occurred. Allows recovery even when a disaster exceeds regional boundaries or extended duration. A wide-area disaster could disable both data centers DC1 and DC2, but with semi-automatic functionality the operations can be shifted to DC3 and continue unaffected by the disaster. Allows for additional staff at the remote data center outside the disaster area. A wide-area disaster affects people located within the disaster area, both professionally and personally. By moving operations out of the main data centers Designing a Disaster Tolerant Solution Using the Three Data Center Architecture to a remotely located recovery data center, operational responsibilities shift to people not directly affected by the disaster. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP A Three Data Center configuration uses a disaster tolerant architecture made up of two data centers which are located locally in a Metrocluster and a third data center located remotely. These form separate Serviceguard clusters, which are configured in a Continentalclusters configuration. This solution is designed to only work with the HP StorageWorks XP Disk Arrays. Primary Data Center (DC1) contains one or more HP-UX servers that are connected to one XP Disk Array located in DC1. The Secondary Data Center (DC2) contains an equal number of HP-UX servers connected to a second XP Disk Array. Continuous Access Synchronous data replication must be established to replicate data between DC1 and DC2. The distance between DC1 and DC2 is limited by Serviceguard heartbeat latency requirements or Continuous Access Sync distance requirements, whichever is smaller. When DC1 and DC2 form a Metrocluster a third site is required where arbitrator and Quorum nodes need to be kept. The arbitrator nodes are needed in order to meet quorum resolution requirements during cluster reformation when all heartbeat networks fail between DC1 and DC2. DC1, DC2 and the arbitrator site must be in the same subnet according to the Serviceguard network. In a Continentalclusters environment, the Metrocluster would be the primary cluster for packages configured in a three data center solution. The Third Data Center (DC3), which is normally located at a long distance from the Metrocluster sites, contains one or more HP-UX servers connected to a third XP Disk Array. These HP-UX servers form a separate Serviceguard cluster and require a quorum server or cluster lock disk. Continuous Access Journal data replication must be established between one of the XP Disk Arrays located in the Metrocluster and the XP Disk Array located in DC3. In a Continentalclusters environment, DC3 is the recovery cluster for packages configured in a three data center solution. It is recommended to maintain a consistent copy of the volume at the remote site, using HP StorageWorks Business Copy XP (BC-XP). This is particularly useful in case of a rolling disaster, which is a disaster that occurs before the cluster is able to recover from a non-disastrous failure. An example is a data replication link that fails, then, as it is being restored and data is being resynchronized, a disaster causes the primary data center to fail resulting in an incomplete resynchronization and inconsistent data at the remote data center. In the case of a rolling disaster, Metrocluster Continuous Access XP and XP Continuous Access software are able to detect the data is inconsistent and do not allow the application package to start. A good copy of the data must be restored before restarting the application. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 289 The following are additional disaster tolerant architecture requirements for a Three Data Center solution: • • • In the disaster tolerant cluster architecture, it is expected that each Metrocluster data center is self-contained such that the loss of one data center does not cause the entire cluster to fail. It is important that all single points of failure (SPOF) be eliminated so that surviving systems continue to run in the event that one or more systems fail. It is also expected that the IP network and SAN equipment between and within the data centers are redundant and routed in such a way that the loss of any one component does not cause the IP network or SAN to fail. Exclusive activation must be used for all LVM volume groups or VxVM disk groups associated with packages that use the XP Disk Array. The following are restrictions in a Three Data Center solution: • • • • Shared LVM, CVM, and CFS are not supported. The design of the Three Data Centers solution assumes that only one system in the cluster will have a VG activated at any time. Multi-instance applications are not supported. Device Group Monitor support is not available. However, packages configured for two data centers can still use the Device Group monitor feature. Continentalclusters bi-directional recovery is not supported. Figure 6-2 shows a typical configuration of a Disaster Tolerant Three Data Center architecture. 290 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Figure 6-2 Three Data Center Architecture Arbitrator or QS Node DC1 NS Arbitrator 0r Quorum server node-third location NS NS QS Node DC2 NS WAN WAN Node A Node B Node C D W D M FCS FCS Storage Cluster Heartbeat network and storage connections redundant DWDM Links routed differently DC3 NodeDB D W D M NS NS FC/IP Converter FCS FCS Storage Node C NodeDB FCS FCS FC/IP Converter Data Replication over IP Storage Overview of HP XP StorageWorks Three Data Center Architecture HP XP StorageWorks Three data center architecture enables data to be replicated over three data centers concurrently using a combination of Continuous Access Synchronous and Continuous Access Journaling data replication. In a XP 3DC design there are two configurations available; Multi-Target and Multi-Hop. The XP 3DC configuration can switch between the Multi-Target and Multi-Hop configurations at any time during a normal operation. These configurations may be implemented with either two or three Continuous Access links between the data centers. In the case of two Continuous Access links, one link is a Continuous Access Sync and the other is a Continuous Access Jnl data replication link. As both supported configurations use two Continuous Access links, they are also referred to as Multi-Hop-Bi-Link and Multi-Target-Bi-Link. Whether the configuration is multi-hop or multi-target is determined by two factors: where data enters the system (that is, where the application is running) and in what direction the data flows between the XP arrays. XP 3DC Multi-Target Bi-Link Configuration In an XP 3DC Multi-Target Bi-Link configuration the data enters the system on a specific XP array and is replicated into multiple directions. One direction is the synchronous Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 291 replication to the XP array in DC2, and the other is the journaling replication to the XP array in DC3. As shown in Figure 6-3, the data is replicated from DC1 to DC2 using Continuous Access Synchronous. The data is also replicated to DC3 using Continuous Access Journaling. Both Continuous Access-Sync and Continuous Access-JNL replication pairs can remain in active (PAIR) status at all times. Figure 6-3 XP Three Data Center Multi-Target Bi-Link Configuration Data Replication 292 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Figure 6-4 3DC Multi-Hop Bi-Link Configuration Data Replication Three Data Center Multi-Hop Bi-Link Configuration In an XP 3DC Multi-Hop Bi-Link configuration the data enters the system on one XP array, is replicated synchronously to the next XP array, and from there is replicated to the last XP array. Typically, the starting point of the operation indicates the data center or host that runs the application under normal conditions, with the secondary data center being the cluster failover site and the recovery data center being the remote recovery site. As shown in Figure 6-4, data is replicated from DC1 to DC2 using Continuous Access Synchronous. The data is then automatically recorded on DC2 and replicated to DC3 using Continuous Access Journaling. Both Continuous Access-Sync and Continuous Access-JNL replication pairs can remain in active (PAIR) status at all times. No point-in-time operations or scripting operation is necessary to keep data on DC3 up-to-date and available. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 293 Determining whether to setup a Multi-Target or Multi-Hop solution depends on your environment and business requirements. Normally, the system that runs the application the majority of the time, will determine the configuration. However, during a failover on the synchronous replication pair, a switch from Multi-Hop to Multi-Target or from Multi-Target to Multi-Hop occurs. For example, a package initially running on DC1 fails over to DC2, then the data source becomes DC2 in a Multi-Hop scenario. In this case, the data replication is altered to be DC2 -> DC1 and DC2 -> DC3, which changes from Multi-Hop, to Multi-Target Data replication. There are no recommendations on whether to use Multi-Hop rather than Multi-Target Data Replication. Both configurations have their own advantages and disadvantages. For additional documentation refer to HP StorageWorks XP 3DC Data Replication manuals available at www.docs.hp.com. HP StorageWorks Mirror Unit Descriptors Using the XP 3 Data Center Architecture, a volume can be configured to be replicated to up to seven other volumes at a time; three copies can be used for BC replication, three copies for Journal replication and one copy for Sync/Async/Journal replication. A mirror unit descriptor (MU#) is a special index number available with all volumes that provides an individual designator for each copy of the volume. The mirror unit descriptor is provided in the Raid Manager configuration files to indicate the nature of the copy. Out of seven mirror unit descriptors, three of these MU#'s are for local replication copies using Business Copy XP, and are represented in the HP StorageWorks RAID Manager XP (RM) configuration file by the values 0, 1, and 2. The RM configuration file assumes MU# of 0 when no MU# is specified in the configuration file. However to avoid confusion, the 0 should be explicitly defined in the configuration for Business Copy XP. The fourth MU # can be used for either Continuous Access XP Sync, or Continuous Access XP Async, or Journal. When it is used for Sync/Async, the MU# can be either 0 or left blank. However, it is always left blank in a Three Data Center environment. The remaining three MU#'s are for Continuous Access-XP Journal replication pairs only. These MU#'s are represented by h1, h2, and h3 values in the RM configuration file. The XP12000 and XP10000 only support one Continuous Access-XP Journal pair at any point in time per volume, and this one pair can use any of the four Continuous Access-XP Journal MU#'s. With XP arrays you can use Continuous Access-XP Journal in combination with Continuous Access-XP Sync to create two independent copies of the same source device to two different target devices. When creating this configuration the Continuous Access-XP Sync replication must use MU# 0 and the Continuous Access-XP Journal replication must use one of the remaining three MU# for remote replication. 294 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Figure 6-5 shows all the available mirror unit descriptors for each data device, as well as the value to use in the HP StorageWorks RAID Manager XP (RM) configuration file to identify the specific replication instance for the device and any environment variables (for example, HORCC_MRCF) necessary to address the copy using RAID Manager XP. Figure 6-5 Mirror Unit Descriptors CA-MU#0 can be used for either for CA Sync/Async or CA-Jnl CA-MU#0 sync ync/A CA S l CA J n Jnl MU#2 P-jnl Jnl MU#3 P-jnl BC MU#0 BC MU#1 BC MU#2 S-jnl P-jnl Jnl MU#1 P-jnl P-VOL S-VOL RM MU#0 or omitted CA Jnl S-jnl S-VOL RM MU#h1 S-jnl S-VOL RM MU#h2 CA Jnl CA Jnl S-jnl S-VOL RM MU#3h3 BC1 BC 2 BC 3 S-VOL S-VOL S-VOL RM MU#0 or omitted HORCC_MRCF=1 RM MU#1 HORCC_MRCF=1 RM MU#2 HORCC_MRCF=1 Figure 6-6 depicts a typical Three Data Center pair configurations with MU# usage in Multi-Target and Multi-Hop topologies. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 295 NOTE: The MU# h2 device group pair must be defined in the XP Three Data Center configuration, since it is being used as a bridge for the remote site pair state query. The MU# h2 device group pair is referred to as a Phantom Device Group since there has not been a physical Continuous Access link established. Figure 6-6 Mirror Unit Descriptor Usage Configuring an XP Three Data Center Solution After the hardware set up is completed for all three data centers including the data replication links between data centers according to Multi-Hop-Bi-Link or Multi-Target-Bi-Link configuration the next step is the software installation and configuration. The cluster software used in a Three Data Center solution includes Serviceguard clusters, Metrocluster Continuous Access XP, Continentalclusters, and HP StorageWorks RAID Manager. 296 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture The following steps describe the process for configuring an XP Three Data Center Solution: 1. 2. 3. 4. 5. 6. 7. Creating the Serviceguard Clusters Creating the Continental Cluster Creating the RAID Manager Configuration Creating Device Group Pairs LVM Volume Groups Configuration VxVM Configuration Package Configuration in a Three Data Center Environment Creating the Serviceguard Clusters Install Serviceguard on all nodes participating in the 3DC solution. This would include arbitrator nodes if using this arbitration method. As previously described, 3DC solution includes two Serviceguard clusters. The resources in two data centers (Primary and Secondary sites) are managed by one cluster, and the third data center are managed by another cluster. Install Metrocluster Continuous Access XP on all nodes participating in the 3DC configuration. Create the clusters according to the process described in the Managing Serviceguard user's guide. For the Primary and Secondary data centers, create a single Serviceguard cluster with components on two sites and arbitrator nodes. In a 3DC solution all the packages in this cluster would be able to failover and failback automatically, which acts as a primary cluster in a Continentalclusters environment for the configured packages. Create another Serviceguard cluster with components in the third data center as described in the Managing Serviceguard user's guide. This cluster will act as a recovery cluster in the Continentalclusters environment. Creating the Continental Cluster Install Continentalclusters software on all nodes participating in the 3DC solution. To configure the continental cluster follow the process described in Chapter 2: “Designing a Continental Cluster”. Apply the continental cluster configuration. Package recovery groups can be added once all the package configurations are added to the primary and recovery clusters. HP StorageWorks RAID Manager Configuration XP RAID Manager host based software is used to create and manage the device group pairs in a three data center configuration. The following section describes the RAID manager configuration process: Creating the RAID Manager Configuration Use the following steps to create the RAID manager configuration: Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 297 1. Ensure that the XP Series disk arrays are correctly cabled to each host system that will run packages whose data reside on the arrays. Each XP Series disk array must be configured with redundant Continuous Access links, each of which is connected to a different LCP or RCP card. When using bi-directional configurations, where data center A is a backup for data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which you want to allow failback. 2. Edit the /etc/services file, adding an entry for the Raid Manager instance to be used with the cluster. The format of the entry is: horcm /udp For more detail, see file/opt/cmcluster/toolkit/SGCA/Samples/ services.example 3. 4. Use the ioscan command to determine what devices on the XP disk array have been configured as command devices. There must be two command devices; a primary one and a secondary one. Copy the default Raid Manager configuration file to an instance-specific name. # cp /etc/horcm.conf /etc/horcm0.conf 5. Create a minimum Raid Manager configuration file by editing the following fields in the file created in the previous step: • HORCM_MON-enter the host-name of the system on which you are editing and the TCP/IP port number specified for this Raid Manager instance in the /etc/ services file. • HORCM_CMD-enter the primary and alternate link device file names for both the primary and redundant command devices (for a total of four raw device file names). 6. If the Raid Manager protection facility is enabled, set the HORCPERM environment variable to the pathname of the HORCM permission file, then export the variable. # export HORCMPERM=/etc/horcmperm0.conf If the Raid Manager protection facility is not used or disabled, export the HORCPERM environment variable. # export HORCMPERM=MGRNOINST 7. Start the Raid Manager instance by using horcmstart.sh # horcmstart.sh 0 8. Export the environment variable that specifies the Raid Manager instance to be used by the Raid Manager commands, such as with the POSIX shell type. # export HORCMINST= 298 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture For example: # export HORCMINST=0 Next, use Raid Manager commands to get further information from the disk arrays. Verify the software revision of the Raid Manager and the firmware revision of the XP disk array. # raidqry -l NOTE: Check the minimum requirement level for XP, Raid Manager software, and firmware for your version in the Metrocluster Continuous Access XP Release Notes. To view a list of the available devices on the disk arrays use the raidscan command. The raidscan command must be invoked separately for each host interface connection to the disk array. For example, if there are two Fibre Channel host adapters. # raidscan -p CL1-A # raidscan -p CL1-B NOTE: There must also be alternate links for each device, and these must be on different busses inside the XP disk array. For example, these alternate links may be CL2-E and CL2-F. Unless the devices have been previously paired either on this or another host, the devices will show up as SMPL (simplex). Paired devices will show up as PVOL (primary volume) or SVOL (secondary volume). To identify HP-UX device files corresponding to each device represented by CU:LDEV run the following command: # ls /dev/rdsk/* | raidscan -find -fx NOTE: Only OPEN-V LUNs are supported in a three data center configuration. The ioscan output must be checked to verify which LUNs are OPEN-V LUNs. XP arrays (XP 10000/XP 12000 and beyond) support external attached storage devices to be configured as either P-VOL or S-VOL or both of a Continuous Access pair. From a Continuous Access perspective, there is no difference between a pair created from either internal or external devices. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 299 Refer to the HP StorageWorks XP documentation for information on the configuration requirements of external storage devices attached to XP arrays and supported external storage devices at www.docs.hp.com. 9. Determine which devices will be used by the application package. Define a device group that contains all of these devices. For normal Three Data Center operations, a package requires three different device groups for the configuration. For Multi-Hop-Bi-Link and Multi-Target-Bi-Link configurations, two device groups represent real Continuous Access-Sync and Continuous Access-Journal pairs. The third is a “phantom” device group that can be used as a bridge to communicate with the far site. In Raid Manager there is a total of three device groups that are independent, and the management on each is done without the knowledge of the other group. The XP Disk Array implements device sharing rules and fails RAID Manager operations whenever a rule is broken. As it is required to configure three different device groups for a package, it is recommended to follow a naming convention. The Continuous Access Sync device group could be named as dg. The Continuous Access Jnl device group could be named as dg_1 and the phantom device group could be named as dg_p. For example, if an Oracle single instance package is configured in a 3DC environment with Multi-Hop-Bi-Link data replication configuration. dgOracle would be the name of Continuous Access Sync device group between DC1 and DC2 dgOracle_1 would be the name of Continuous Access Jnl device group between DC2 and DC3 dgOracle_p would be the name of phantom device group between DC1 and DC3 Edit the Raid Manager configuration file (horcm0.conf) in the above example to include the devices and device group used by the application package. Only one device group may be specified for all of the devices that belong to a single application package. These devices are specified in the field HORCM_DEV. Also complete the HORCM_INST field, supplying the names of only those hosts that are attached to the XP disk array that is remote from the disk array directly attached to this host. The following are sample RAID Manager configuration files given for each data replication configuration. In the sample configuration files, the device groups names has been simplified for more clarity. 300 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Multi-Target Raid Manager Configuration For a Multi-Target topology, the DC1, is configured as primary site of an application and is the source of the data replicating to the DC2 and DC3 as shown in Figure 6-7. Figure 6-7 Multi-Target Bi-Link (1:2) Sample Raid Manager Configuration on a DC1 NodeA (multi-target bi-link) HORCM_MON # ip_address NodeA service poll(10ms) horcm0 1000 HORCM_CMD # dev_name /dev/rdsk/c6t12d0 HORCM_DEV # dev_group dg dev_name /dev/rdsk/c9t12d0 dev_name dg_d0 port# CL3-E timeout(10ms) 3000 dev_name TargetID 6 LU# MU# 5 Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 301 dg_1 dg_1_d0 CL3-E 6 HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg NodeB.dc2.net 5 h1 service horcm0 # communicate with DC3 nodes dg_1 NodeC.dc3.net horcm0 Sample Raid Manager Configuration on a DC2 NodeB (multi-target bi-link) HORCM_MON # ip_address NodeB service horcm0 HORCM_CMD # dev_name /dev/rdsk/c21t8d0 poll(10ms) 1000 timeout(10ms) 3000 dev_name /dev/rdsk/c24t8d0 HORCM_DEV # dev_group dev_name port# dg dg_d0 dev_name TargetID CL1-A 13 LU# # phantom device group dg_p dg_p_d0 CL1-A 13 1 HORCM_INST # dev_group ip_address # communicate with DC1 nodes dg NodeA.dc1.net MU# 1 h2 service horcm0 # communicate with DC3 nodes dg_p NodeC.dc3.net horcm0 Sample Raid Manager Configuration on a DC3 NodeC (multi-target bi-link) HORCM_MON # ip_address NodeC HORCM_CMD # dev_name /dev/rdsk/c6t2d0 service horcm0 timeout(10ms) 200 dev_name /dev/rdsk/c8t2d0 HORCM_DEV # dev_group dev_name port# dg_1 dg_1_d0 CL2-A # phantom device group dg_p dg_p_d0 CL2-A 0 302 poll(10ms) 1000 dev_name TargetID 0 LU# MU# h1 5 5 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture h2 HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg_p NodeB.dc2.net # communicate with DC1 nodes dg_1 NodeA.dc1.net service horcm0 horcm0 Multi-Hop Raid Manager Configuration Figure 6-8 depicts a Multi-Hop topology where DC1 is configured as the primary site and is the data replicating source to DC2 and DC3. Figure 6-8 Multi-Hop Bi-Link (1:1:1) Sample Raid Manager Configuration on a DC1 NodeA (multi-hop-bi-link) HORCM_MON # ip_address NodeA service horcm0 poll(10ms) 1000 timeout(10ms) 3000 Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 303 HORCM_CMD # dev_name /dev/rdsk/c6t12d0 HORCM_DEV # dev_group dg dev_name /dev/rdsk/c9t12d0 dev_name dg_d0 dev_name TargetID 6 CL3-E 6 # phantom device group dg_p dg_p_d0 port# CL3-E HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg NodeB.dc2.net LU# 5 MU# 5 h2 service horcm0 # communicate with DC3 nodes dg_p NodeC.dc3.net horcm0 Sample Raid Manager Configuration on a DC2 NodeB (multi-hop-bi-link) HORCM_MON # ip_address NodeB service horcm0 HORCM_CMD # dev_name /dev/rdsk/c21t8d0 poll(10ms) 1000 timeout(10ms) 3000 dev_name /dev/rdsk/c24t8d0 HORCM_DEV # dev_group dev_name port# TargetID dg dg_d0 CL1-A 13 dg_1 dg_1_d0 CL1-A 13 1 dev_name LU# MU# 1 h1 HORCM_INST # dev_group ip_address service # communicate with DC1 nodes dg NodeA.dc1.net horcm0 # communicate with DC3 nodes dg_1 NodeC.dc3.net horcm0 Sample Raid Manager Configuration on a DC3 NodeC (multi-hop-bi-link) HORCM_MON # ip_address NodeC HORCM_CMD # dev_name /dev/rdsk/c6t2d0 service horcm0 poll(10ms) 1000 dev_name /dev/rdsk/c8t2d0 timeout(10ms) 200 dev_name HORCM_DEV 304 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture # dev_group dev_name port# dg_1 dg_1_d0 CL2-A TargetID 0 # phantom device group dg_p dg_p_d0 CL2-A 0 HORCM_INST # dev_group ip_address # communicate with DC2 nodes dg_1 NodeB.dc2.net # communicate with DC1 nodes dg_p NodeA.dc1.net LU# MU# 5 h1 5 h2 service horcm0 horcm0 Alternative to HORCM_DEV An alternative to HORCM_DEV section with Port, Target ID and LUN ID, is to use HORCM_LDEV with XP Storage Serial Number and CU:LDEVs. HORCM_LDEV # dev_group dev_name Serial# #dg dg_0 60095 # The following alternatives # dg dg_0 60095 # dg dg_0 60095 CU:LDEV(LDEV#) MU 01:04 0 are equivalents of the above entry 260 0 0x104 0 The HORCM_LDEV parameters are used to describe stable LDEV# and Serial# as another way of HORCM_DEV used ‘port#, Target-ID, LUN’. This parameter is used to describe the Serial number of XP Storage Array as follows:This parameter is used to describe the LDEV number in an XP Storage Array, and is supported by the three types of formats as follows: • Specifying CU:LDEV in hex used by SVP or Web console Example for LDEV# 260 01:04 • Specifying LDEV in hex used by inqraid -fxcommand Example for LDEV# 260 0x104 • Specifying LDEV in decimal used by inqraid -fxcommand Example for LDEV# 260 260 If the SAN configuration is set up in which all nodes that are connected to a particular XP array, the Target IDs and LUN IDs in HORCM_DEV section will vary depending on hardware paths to the array. You may consider using HORCM_LDEV for more usability and readability, because HORCM_LDEV section will not vary for all nodes connected to a particular XP array. This is because Serial # of XP Array and CU:LDEVs are unique across all nodes. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 305 11. Restart the Raid Manager instance so that the new information in the configuration file is read. # horcmshutdown.sh # horcmstart.sh 12. Repeat steps 2 through 11 on each host runs this particular application package. If a host runs more than one application package, you must incorporate device group and host information for each of these packages. NOTE: The Raid Manager configuration file must be different for each host, especially for the HORCM_MON and HORCM_INST fields. Creating Device Group Pairs An application configured for an XP Three Data Center solution contains two device groups; Continuous Access-Sync and Continuous Access-Journal device groups. Both device group pair relations must be established prior to the rest of the configurations and normal package operations. 306 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture For a Multi-Hop data replication topology, first create the Continuous Access-Sync pair. Then, create the Continuous Access-Journal pair after completion of the Continuous Access-Sync pair creation. To create the pairs use the following: • Create Sync pairs of DC1-DC2 device group from any DC1 node: — paircreate -g dg -vl -f data -c 15 or — paircreate -g dg -vl -f never -c 15 • Create Journal pairs of DC2-DC3 once the DC1-DC2 device group pairs are in “PAIR” state from any DC2 node: — paircreate -g dg_1 -vl -f async -c 15 -jp 2 -js 2 For a Multi-Target data replication topology, the Continuous Access-Sync and Continuous Access-Journal pairs can be created one followed by another. Or they can be created at the same time. Use the following to create the pairs: • Create Sync pairs of DC1-DC2 device group from any DC1 node. — paircreate -g dg -vl -f data -c 15 or — paircreate -g dg -vl -f never -c 15 • Create Journal pairs of DC1-DC3 device group from any DC1 node (You can verify whether they are in a “PAIR” state with the pairdisplay command). — paircreate -g dg_1 -vl -f async -c 15 -jp 2 -js 2 NOTE: Paired devices must be of compatible sizes and types. Only OPEN-V LUNs are supported for three data center configuration. NOTE: There is no need to issue the “paircreate” command on phantom device groups. Identification of HP-UX device files Before you create volume groups, you must determine the Device Special Files (DSFs) of the corresponding LUNs used in the XP array. To determine the legacy DSFs corresponding to the LUNs in the XP array: # ls /dev/rdsk/* | raidscan -find -fx Following is the output that is displayed: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdsk/c5t0d0 0 F CL3-E 0 0 10053 321 OPEN-3 This output displays the mapping between the legacy DSFs and the CU:LDEVs. In this output the value for LDEV specifies the CU:LDEV without the : mark. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 307 To determine the agile DSFs that are supported from HP-UX 11i v3 and CU:LDEV mapping information run the following command: # ls /dev/rdisk/* | raidscan -find -fx Following is the output that you will see: DEVICE_FILE UID S/F PORT TARG LUN SERIAL LDEV PRODUCT_ID /dev/rdisk/disk232 0 F CL4-E 0 0 10053 321 OPEN-3 NOTE: There must also be alternate links for each device, and these alternate links must be on different busses inside the XP disk array. For example, these alternate links may be CL2-E and CL2-F. LVM Volume Groups Configuration LVM Volume Groups using the application device group must be created (or imported) in all three data centers cluster nodes. Use the same way to create/import the volume groups as being used in a regular two-site Metrocluster setup. Create and export all LV Groups in one of the DC1 nodes and, import all the Volume Groups for the rest of the three data centers cluster nodes. Use the following procedure to create and export volume groups: 1. Define the appropriate Volume Groups on all cluster nodes that run the application package. # mkdir /dev/vgxx # mknod /dev/vgxx/group c 64 0xnn0000 Where the VG name and minor number nn are unique for each volume group defined in the node. 2. Create the Volume Group only on one node in primary data center (DC1). Use the following commands: # pvcreate -f /dev/rdsk/cxtydz # vgcreate /dev/vgname /dev/dsk/cxtydz 3. 4. Create the logical volume(s) for the volume group on the node, and create any file systems required. Export the Volume Groups on the node without removing the special device files. # vgchange -a n # vgexport -s -p -m Make sure to copy the mapfiles to all of the three data centers nodes. 5. Import the Volume Groups on all of the other nodes in DC1, DC2 and DC3 and backup the LVM configuration. # mkdir /dev/vgxx 308 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture # mknod /dev/vgxx/group c 64 0xnn0000 # vgimport -s -m # vgchange -a y # vgcfgbackup # vgchange -a n VxVM Configuration Use the following procedure to create disk groups for VERITAS storage. The VxVM root disk group (rootdg) may need to be created depending on the VxVM version. If rootdg is required, make sure it has already been created on the system while configuring the storage. On one node in the primary data center (DC1) do the following: 1. Initialize disks to be used with VxVM by running the vxdisksetup command only on one node. # /opt/VRTS/bin/vxdisksetup -i c5t0d0 2. Create the disk group to be used with the vxdg command only on one node. # vxdg init logdata c5t0d0 3. Verify the configuration. # vxprint -g logdata 4. Use the vxassist command to create logical volumes. # vxassist -g logdata make logfile 2048m 5. Verify the configuration. # vxprint -g logdata 6. Make the filesystem. # newfs -F vxfs /dev/vx/rdsk/logdata/logfile 7. Create a directory to mount the volume group. # mkdir /logs 8. Mount the volume group. # mount /dev/vx/dsk/logdata/logfile /logs 9. Check if file system exits, then unmount the file system. # umount /logs 10. Deport the disk group on the primary node. # vxdg deport logdata Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 309 Package Configuration in a Three Data Center Environment This procedure must be repeated on all the participating nodes for each Serviceguard package. As there are two Serviceguard clusters, packages must be configured individually in each cluster. Customizations include editing a package configuration file and an environment file to set environment variables, and customizing the package control script to include customer-defined run and halt commands, as appropriate. The package control script must also be customized for the particular application software that it will control. Refer to the Managing Serviceguard user's guide for more detailed instructions on how to start, halt, and move packages and their services between nodes within a cluster. 1. Create a directory /etc/cmcluster/ for each package. # mkdir /etc/cmcluster/ 2. Create a package configuration file. # cd /etc/cmcluster/ # cmmakepkg -p .config Customize the package configuration file as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/pkgname/ .cntl) for the RUN_SCRIPT and HALT_SCRIPTparameters. 3. 4. In the .config file, list the node names in the order in which you want the package to fail over. It is recommended for performance reasons, to have the package fail over locally first, then to the remote data center. Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time required to obtain status from the XP Series disk array. Create a package control script. # cmmakepkg -s .cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. 5. Copy the environment file template /opt/cmcluster/toolkit/SGCA/ xpca.env to the package directory, naming it _xpca.env # cp /opt/cmcluster/toolkit/SGCA/xpca.env \/etc/cmcluster/pkgname/_xpca.env 310 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture NOTE: If you do not use a package name as a filename for the package control script, you must follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (xpca) used. The extension of the file must be env. The following examples describe how the environment file name should be chosen: Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_xpca.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_xpca.env. 6. Edit the environment file _xpca.env as follows: a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, then uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_. It is recommended that you retain the default values of these variables unless you have a specific business requirement to change them. Explanation of these variables have been given in the environment template file. Data currency can not be guaranteed in DC3. In order to recover the package on DC3, either the AUTO_NONCURDATA variable should be set to 1 in the package environment file on the DC3 node or the FORCEFLAG needs to be present in the package directory on the DC3 node. c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/ cmcluster/_name, removing any quotes around the file names. d. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used or disabled. If Raid Manager protection facility is enabled set it to the name of the HORCM permission file. e. Uncomment the HORCMINST variable and set it to the Raid Manager instance name used by Metrocluster/Continuous Access. f. Uncomment the FENCE variable and set it to either NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array only in the Metrocluster of the package. FENCE level for all data center must be the same, either DATA or NEVER. FENCE level ASYNC must not be used for DC3 configurations, despite the use of Continuous Access journaling device groups. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 311 g. If you are using asynchronous data replication, set the HORCTIMEOUT variable to a value greater than the side file timeout value configured with the Service Processor (SVP), but less than the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default setting is the side file timeout value + 60 seconds. h. Uncomment the CLUSTER_TYPE variable. For package configuration in the primary (DC1) and secondary (DC2) data center nodes, the CLUSTER_TYPE variable should be set to “metro” instead of “continental”. For package configuration in the third data center nodes (DC3), the CLUSTER_TYPE variable should be set to “continental”. i. The 3DC_TOPOLOGY variable provides Three Data Center topology information in terms of the number of physical links, application failover, and initial data replication configurations. The possible values include: • multi-target-bi-link:This value represents Multi-Target Three Data Center configuration with 2 Continuous Access links. • multi-hop-bi-link: This value represents the Multi-Hop Three Data Center configuration with 2 Continuous Access links. NOTE: If the 3DC_TOPOLOGY variable is commented out, it indicates this is a standard 2 data center (Metrocluster) configuration. j. Each value of the following parameters are the list of cluster node names residing in each data center. Node names are comma-separated in each list. The values for these variables help the disaster tolerant software determine the package startup location. The definitions for the DC1, DC2, and DC3 are: • DC1_NODE_LIST=”node1, node2” • DC2_NODE_LIST=”node3, node4” • DC3_NODE_LIST=”node5, node6” DC1 is Primary site of a package DC2 is Hot Standby site of a packageDC3 is Second Standby site of a package k. The values of the following parameters are device group pair names for a specific package and is used to operate on Raid Manager commands. Each package requires three unique device group names. • DC1_DC2_DEVICE_GROUP • DC2_DC3_DEVICE_GROUP • DC1_DC3_DEVICE_GROUP For Three Data Center Multi-Target topology: • DC1_DC3 must use Continuous Access-Journal • DC2_DC3 uses phantom device A typical definition of values would be as follows: 312 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture • • • DC1_DC2_DEVICE_GROUP=”dg” DC2_DC3_DEVICE_GROUP=”dg_p” DC1_DC3_DEVICE_GROUP=”dg_1” For Three Data Center Multi-Hop topology: • DC2_DC3 must use Continuous Access-Journal • DC1_DC3 uses phantom device A typical definition of values would be as follows: • DC1_DC2_DEVICE_GROUP=”dg” • DC2_DC3_DEVICE_GROUP=”dg_1” • DC1_DC3_DEVICE_GROUP=”dg_p” The phantom device group, defined in the RAID Manager configuration files, is used as a bridge to access the remote site pair state. NOTE: See Appendix A of this guide for a full description of these environment variables. l. The values for the following variables are the same as the default values defined in the configuration file: MULTIPLE_PVOL_OR_SVOL_FRAMES_FOR_PKG, HORCTIMEOUT and WAITTIME. These variables may need to be configured based on your particular requirements. m. All the remaining variables in the Environment file may remain commented out. 7. After customizing the control script file and creating the environment file, and before starting up the package, do a syntax check on the control script using the following command: (be sure to include the -n option to perform syntax checking only)# sh -n If any messages are returned, correct the syntax errors. 8. 9. Distribute the Metrocluster/Continuous Access configuration, environment and control script files to all the other nodes in the three data centers. Verify that each node in all data centers has the following files in the directory /etc/cmcluster/: • .cntl Seviceguard package control script. • _xpca.env Metrocluster/Continuous Access environment file. • .config Serviceguard package ASCII configuration file. • Any other files/scripts you use to manage Serviceguard packages. Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 313 10. Check the configuration using the cmcheckconf -P .configin each cluster, then apply the Serviceguard configuration using the cmapplyconf -P .configcommand or SAM. 11. Once all the package configurations are added to the individual clusters, edit the Continentalclusters configuration file to add recovery groups. Then check the configuration file syntax by using the following command: # cmcheckconcl -v -C \ To apply the configuration run the following: # cmapplyconcl -C For further details about the Continentalclusters configuration, see Chapter 2: “Designing a Continental Cluster”. Timing Considerations In a journal device group, many journal volumes can be configured to hold a significant amount of the journal data (host-write data). The package startup time may increase significantly when a package fails over. Delays in package startup times will occur in these situations: 1. Recovering from a broken pair affinity. On failover, the SVOL pulls all the journal data from the PVOL site. The time needed to complete all data transfer to the SVOL depends on the amount of outstanding journal data in the PVOL and the bandwidth of the Continuous Access links. 2. Host I/O is faster than the Continuous Access Journal data replication. The outstanding data not being replicated to the SVOL is accumulated in journal volumes. Upon package failover to the SVOL site, the SVOL pulls all the journal data from PVOL site. The time to complete all the data transfer to the SVOL depends on the bandwidth of the Continuous Access links and amount of outstanding data in the PVOL journal volume. 3. Failback - When systems recover from previous failures, a package can be failed back, within Metrocluster data centers, by manually issuing a Serviceguard command. When a package failback is triggered, the software needs to ensure the application data integrity. It executes storage preparation actions to the XP array, if it is necessary, prior to the package startup. NOTE: Do not use Serviceguard to failback to DC3. You need to take manual steps to replicate data back from DC3. See “Failback Scenarios” (page 318). In a three data center configuration whenever a package tries to start up a RAID Manager instance on a host, that host communicates with other RAID Manager instances in different data centers. In case the other data center RAID Manager instances are down it will wait for a time out value that is configured in the RAID Manager configuration file for each data center. In this scenario to reduce package startup time, 314 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture set the instance timeout value under the HORCM_MON section of the RAID Manager instance configuration file to a low, but safe value. Bandwidth for Continuous Access and Application Recovery Time When a disaster event in the entire Metrocluster causes an application package to be manually failed over to the recovery site (the third data center), the Continentalclusters and storage software performs the following actions: • Perform a takeover by issuing a command to the third data center XP array via RAID Manager. This changes the XP disk devices that are used by the application from Read Only to Read/Write mode. If the PVOL site XP array is still up, it will flush all of the outstanding data in its journal volumes to the local XP array as a part of the takeover. Depending upon the bandwidth of the Continuous Access links and the amount of outstanding data, the takeover operation may take some time. This time value is referred to as TakeOverTime. • Activates the volume group(s). The time for this is minimal; normally within a few seconds per volume group. • Check and mount any file systems if file systems are used. If Continuous Access data replication has not failed, it should not take much time to check the file system. If Continuous Access data replication did fail, it would require additional time to repair any file systems. This time value is referred to as CheckandRepairTime. • Add any package IP addresses. The time for this is minimal; normally within a second. • Start the package application(s). If the application requires a database recovery, it may take time before the application(s) is finally up and running. This time value is referred to as AppRecoveryTime. The total application recovery time is equal to TakeOverTime + CheckandRepairTime + AppRecoveryTime. During the planning phase for the cluster, the sizing of the link bandwidth for Continuous Access should take the time value for TakeOverTime into consideration. During the implementation phase for the cluster, tests should be executed to measure the total time, TakeOverTime, it would take to failover including flush a full set of journal volumes from the PVOL site XP array to the SVOL site XP array. The HORCTIMEOUT environment variable in the package's environment file should be configured greater than or equal to this time value. The HORCTIMEOUTvalue is used by the RAID Manager takeover command to determine the maximum amount of time to allow for the takeover to complete. Bandwidth for Continuous Access and Application Recovery Time 315 NOTE: If the HORCTIMEOUT parameter configured is too short, the time allowed for the take over operation to complete will expire before the primary XP array has flushed all of the outstanding data in it’s cache to the secondary XP array. This will cause the takeover action to fail, and the package will fail to start. When this happens, an error message will be logged in the control script log file with instructions on what to do next. Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover The following section describes data maintenance in the event of a Swap Takeover in a Metrocluster Continuous Access XP environment. Swap Takeover Failure (for Continuous Access Sync Pair) When a Continuous Access-Sync device group pair state is SVOL-PAIR in local site and PVOL-PAIR in remote site (assume the Continuous Access-Journal pair is in normal state - PVOL_PAIR/SVOL_PAIR), the Three Data Center software performs a swap takeover. The swap takeover would fail if there is an internal (unseen) error (that is, cache or shared memory failure) in the device group pair. In this case, if the AUTO_NONCURDATA is set to 0, the package will not be started and the SVOL state is change toSVOL_PSUE (SSWS) by the takeover command. The PVOL site either remains in PVOL_PAIR or is changed to PVOL_PSUE. The SVOL is in SVOL_PSUE(SSWS) meaning that the SVOL is read/write enabled and the data is usable but not as current as PVOL. In this case, user can either (1) use FORCEFLAG to startup the package on SVOL site or (2) fix the problem and resume the data replication with the following procedures: For Multi-Hop Topology: 1. Split the Continuous Access-Sync device group pair completely (pairsplit -g dg -S). 2. Split the Continuous Access-Journal device group pair completely (pairsplit -g dg_1 -S). 3. Re-create the Continuous Access-Sync pair from original PVOL as source. (use paircreate command). 4. Re-create the Continuous Access-Journal pair from original PVOL as source. (use paircreate command). 5. Startup package on its primary site For Multi-Target Topology: 316 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture 1. 2. 3. Split the Continuous Access-Sync device group pair completely (pairsplit -g dg -S). Re-create the Continuous Access-Sync pair from original PVOL as source. (use paircreate command) Startup package on its primary site. Takeover Timeout (for third data center) When a package is being failed over to the third data center (SVOL of the Continuous Access-Journal device group), the Metrocluster toolkit script issues takeover command on the SVOL. If the journal group pair is flushing the journal data from its PVOL to SVOL and takeover timeout occurs, the following situations would happen: 1. The device group pair state remains in PVOL-PAIR/SVOL-PAIR 2. The journal data continues transferring to the SVOL In this case, you must wait for the completion of the journal data flushing and the Continuous Access-Journal pair state to be: • Hot-standby site: PVOL-PAIR or PVOL-PSUS(E) • Third site: SVOL-PSUS(SSWS) or SVOL-PSUE(SSWS) Either 1) use FORCEFLAG to startup the package on third site or 2) fix the problem (if any of Continuous Access links was failed) and resume the data replication with the following procedures: 1. split the Continuous Access-Journal device group pair completely (pairsplit -g -S) 2. re-create a Continuous Access-Journal pair from it’s original PVOL as source. (use the paircreate command) NOTE: You can specify “none” as the copy mode for initial copy operations. If the none mode is selected, full copy operations are not performed. The user is responsible for using the none mode only when the user is sure that data in the primary data volume is exactly the same as data in the secondary data volume. Continuous Access-Journal Device Group PVOL-PAIR with SVOL-PSUS(SSWS) State PVOL-PAIR with SVOL-PSUS(SSWS)is an intermediate state. It could happen, but is unlikely to be seen. The state PVOL-PAIR/SVOL-SSWS is an invalid state for XP Continuous Access Journal. In this state, if you issue a pairresync or takeover command, it would fail. It is necessary to wait for the PVOL to become PSUE or PSUS. Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover 317 Failback Scenarios This section describes the procedures for the following failback scenarios in a Three Data Center environment. • • MULTI-HOP-BI-LINK (DC1 > DC2 > DC3) Package Failback from DC3 to DC1 MULTI-TARGET-BI-LINK (DC2 > DC1 > DC3) Package Failback from DC3 to DC1 Failback from Data Center 3 (DC3) In the event of a disaster at the Metrocluster sites, the package fails over to DC3 in the recovery cluster. Use the following steps to move the packages back to DC1: 1. 2. 3. 4. 5. 6. 7. 8. 9. Verify all the nodes in DC1, DC2 and DC3 are up and running. Start DC1-DC2 cluster if it is not running. Start all RAID Manager instances on each node in DC1, DC2 and DC3. Verify all the Continuous Access links are up. Halt the package if it is running on DC3. Recover the latest data from DC3. Change the Cluster ID if the package is using LVM. Enable the package on all nodes in DC1 and DC2. Start the package on its primary node. Recovering the latest data from DC3, as described in Step 6 above, guarantees that before the packages are run on DC1, the latest data from DC3 is replicated on to DC1. The process for this may vary depending on whether the 3DC configuration uses either Multi-hop or Multi-target topology. The next section lists the required commands to perform this process, which correspond to both the configurations, and are relavent only to the recovery process of the latest data from DC3 to DC1. MULTI-HOP-BI-LINK (DC1 > DC2 > DC3) Data Recovery from DC3 to DC1 The following describes the process to restart a package back to DC1 after the package fails over to DC3: 1. Log on to any node at DC2 and perform the following: a. Check the pair status of the Sync device group: # pairvolchk -g dg -s b. If the local dg volume is in PVOLor SVOL-SSWS. # pairsplit -g dg Go to Step 2 Or 318 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture c. If local dg volume is other than PVOL or SVOL-SSWS perform a SVOL takeover to make the local volume SVOL-SSWS or PVOL # horctakeover -g dg -S # pairsplit -g dg d. If the above command fails, then split the pair to SMPL state. # pairsplit -g dg -S 2. Resync data from DC3 to DC2. Log onto any node at DC2. Resync Journal device group to get the latest data from DC3 to DC2. a. Check the pair status of dg_1 (DC3 side). # pairvolchk -g dg_1 -s -c b. If dg_1 (DC3 side) is in PVOL. # pairresync -g dg_1 Or If dg_1 (DC3 side) is in SVOL-SSWS. # pairresync -g dg_1 -c 15 -swapp c. Wait for PAIR state to come up for the Journal device group. # pairevtwait -g dg_1 -t 300 -s pair d. Swap the role of Journal device group between DC2 and DC3 # horctakeover -g dg_1 -t 360 e. Wait for the Sync device group to attain the PAIR state. # pairevtwait -g dg_1 -t 300 -s pair Failback Scenarios 319 3. Resync sync device group to get latest data from DC2 to DC1. • If in Step 1, the dg pair has been brought to SMPL state. a. Create the DC1 and DC2 Sync device group. # paircreate -g dg -f never/data -c 15 -vl b. Wait for the PAIR state to come up for the Sync device group. # pairevtwait -g dg -t 300 -s pair Go to Step 4 • If in Step 1 the dg pair was not split to SMPL and the dg local volume is PVOL. a. If dg (DC1) is in SVOL-SSWS. # pairsplit -g dg -RB # pairsplit -g dg b. Resync the device group dg # pairresync -g dg c. Wait for the Sync device group to attain the PAIR state. # pairevtwait -g dg -t 300 -s pair Go to Step 4 • If in Step 1 the dg pair was not split to SMPL and dg local volume is SVOL-SSWS. # pairresync -g dg -c 15 -swaps a. Wait for the Sync device group to attain the PAIR state. # pairevtwait -g dg -t 300 -s pair 4. Log on and perform the following from any DC1 node. a. SWAP the roles of the sync device group between DC1 and DC2. # horctakeover -g dg 5. Log on and perform the following from any DC2 node: a. Wait for PSUS to come up for the Journal Device Group. # pairevtwait -g dg_1 -t 300 -s psus b. Resync the Journal device group. # pairresync -g dg_1 c. Wait for the PAIR state to come up for the Journal device group. # pairevtwait -g dg_1 -t 300 -s pair 320 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture MULTI-TARGET-BI-LINK (DC2 > DC1 > DC3) Data Recovery from DC3 to DC1 Use the following steps to restart a package back to DC1 after the package fails over to DC3: 1. Log on to any node on DC1. a. Check the pair status of the Sync device group. # pairvolchk -g dg -s b. If the local dg volume is inPVOL or SVOL-SSWS # pairsplit -g dg Go to Step 2 • If the local dg volume is other than SVOL-SSWS or PVOL perform an SVOLtakeover to make the local volume SVOL-SSWS or PVOL. # horctakeover -g dg -S # pairsplit -g dg Go to step 2, • If above command fails, split the pair to SMPL. # pairsplit -g dg -S 2. Resync data from DC3 to DC1. Resync the Journal device group to get the latest data from DC3 to DC1. a. Check the pair status of dg_1 at DC3. # pairvolchk -g dg_1 -s -c b. If dg_1(DC3) is in PVOL # pairresync -g dg_1 Or If dg_1(DC3) is in SVOL-SSWS. # pairresync -g dg_1 -c 15 -swapp c. Wait for the PAIR state to come up for the Journal device group. # pairevtwait -g dg_1 -t 300 -s pair d. Swap the Journal device group role between DC1 and DC3. # horctakeover -g dg_1 -t 360 e. Wait for PAIR state to come up. # pairevtwait -g dg_1 -t 300 -s pair 3. Resync the device group to get latest data from DC1 to DC2. • If in step 1 the dg has been brought to the SMPL state Failback Scenarios 321 a. Create the DC1 and DC2 Sync device group. # paircreate -g dg -f never/data -c 15 -vl b. Wait for PAIR state to come for the Sync device group. # pairevtwait -g dg -t 300 -s pair • If in Step 1 the dg pair was not split to SMPL and the dg local volume is PVOL. a. If dg (DC2) is in SVOL-SSWS. # pairsplit -g dg -RB # pairsplit -g dg b. Resync the device group dg # pairresync -g dg c. Wait for PAIR state to come up. # pairevtwait -g dg -t 300 -s pair • If in step 1, the dg pair was not split to SMPL and the dg local volume is SVOL-SSWS #pairresync -g dg -c 15 -swaps a. Wait for PAIR state to come up # pairevtwait -g dg -t 300 -s pair NOTE: Refer to “HP StorageWorks RAID Manager XP User's Guide” for explanation of different command options. Additional Reading The following documents contain additional useful information: • • • Managing Serviceguard Twelfth Edition (B3936-90100) Understanding and Designing Serviceguard Disaster Tolerant Architectures (B7660-90018) Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters (B7660-90019) Use the following URL to access HP’s High Availability web page: • http://www.hp.com/go/ha Use the following URL for access to a wide variety of HP-UX documentation: • http://docs.hp.com/hpux To learn more about HP StorageWorks XP disk arrays, contact your local HP storage representative. 322 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture 7 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture This chapter describes Site Aware Disaster Tolerant Architecture (SADTA) for deploying complex multi-instance workloads such as Oracle Database 10gR2 RAC for disaster tolerance in Metrocluster. Special software features such as Site Controller package, and Site Safety Latch, provide robust automatic failover of the multi-instance workload between the two sites. This chapter addresses the following topics: • • • • • • • Overview of Site Aware Disaster Tolerant Architecture SADTA and Oracle Database 10gR2 RAC Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture Understanding Site Failover in a Site Aware Disaster Tolerant Architecture Administering the Site Aware Disaster Tolerant Metrocluster Environment Limitations of a Site Aware Disaster Tolerant Architecture Troubleshooting Overview of Site Aware Disaster Tolerant Architecture SADTA is an architecture that enables deploying complex workloads in a Metrocluster. Complex workloads are applications that are configured using multiple inter-related MNP packages that must be managed collectively. For example, Oracle Database 10gR2 RAC is a complex workload. In SADTA, components such as Site Controller package and Site Safety Latch enable robust automatic site failover of these complex workloads. In SADTA, isolated sub-clusters, such as Oracle Clusterware or Serviceguard Storage Management Suite (SG SMS) Cluster File System, are created at each site within the Metrocluster. Sub-clusters are clusterwares that run above the Serviceguard cluster and comprise only the nodes in a Metrocluster site. Sub-clusters have access only to the storage arrays within that site. Complex workloads can be configured redundantly over sub-clusters at each site in the Metrocluster. Components of SADTA start and monitor the workload configuration at one sub-cluster. When the workload configuration in a sub-cluster at one site stops running, SADTA components failover the workload to a sub-cluster at the other site. A built-in mechanism in SADTA prevents the redundant workload configuration from running simultaneously on multiple sub-clusters. This feature is available with Metrocluster EVA, Metrocluster with EMC SRDF, and Metrocluster with Continuous Access XP. The SADTA feature requires additional software products to be installed in the Metrocluster. For more information on the Overview of Site Aware Disaster Tolerant Architecture 323 required software and supported versions, see the Disaster Tolerant Clusters Products Compatibility Feature Matrix available at http://docs.hp.com. NOTE: Currently, only the Oracle Database 10gR2 RAC workload is supported in SADTA. Components of SADTA This section describes the components of SADTA. Following are the components of this feature: • • • • • • “Site” (page 324) “Oracle Clusterware Sub-cluster” (page 325) “Cluster File System Sub-cluster” (page 326) “Complex Workload Packages” (page 326) “Site Controller Package” (page 327) “Site Safety Latch” (page 330) Site Site, in SADTA, is a collection of Metrocluster nodes in the same location that are connected to the same disk array. The site information is configured in the cluster configuration file. Sub-clusters, such as Oracle Clusterware or SG SMS Cluster File System, are formed using this site information. The Serviceguard cluster configuration file includes the following new attributes to define sites: • SITE_NAME To define a unique name for a site in the cluster. • SITE To associate a node to a site. This must be defined with the NODE_NAME attribute of the respective nodes. Following is a sample of the site definition in a Serviceguard cluster configuration file: SITE_NAME san_francisco SITE_NAME san_jose NODE_NAME SFO_1 SITE san_francisco ... NODE_NAME SFO_2 SITE san_francisco ... NODE_NAME SJC_1 SITE san_jose ... 324 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture NODE_NAME SJC_2 SITE san_jose When sites are configured in a cluster, the cmviewcl command lists the cluster nodes information according to each site, starting with the site name. HP Serviceguard enables this site feature only when the Metrocluster software, with the Site Controller functionality, and SGeRAC is installed on all nodes in the cluster. If the necessary software products are not installed, the cmapplyconf and cmcheckconf commands will fail and one of these error messages is displayed: SGeRAC sub-clustering functionality is not installed. or Metrocluster Site Controller feature is not installed. Oracle Clusterware Sub-cluster Oracle Clusterware is layered above the Serviceguard cluster membership. The Oracle clusterware sub-cluster is formed with membership exclusively from a set of nodes in an underlying site definition in the Serviceguard cluster. The Oracle clusterware sub-clusters must be configured separately at each site using Oracle Universal Installer. While installing and configuring the Oracle clusterware sub-cluster, only the nodes that belong to the respective site must be selected. At each site sub-cluster, the Oracle Clusterware daemons are configured as MNP packages using the SGeRAC toolkit. The Oracle Cluster Registry (OCR) and Voting disk storage for an Oracle clusterware sub-cluster must not be replicated between the sites. In some scenarios, such as the cross-subnet configuration, Serviceguard Extension for RAC (SGeRAC) recommends network configurations where the Oracle Clusterware heartbeat network does not coincide with the Serviceguard heartbeat network. In such cases, the Oracle Clusterware heartbeat subnet must be monitored by configuring it as CLUSTER_INTERCONNECT_SUBNET in a separate MNP package for each site. The Oracle clusterware MNP package must be made dependent on this package. An Oracle clusterware sub-cluster can be started and stopped independently of the Oracle clusterware sub-cluster on the other site. The Oracle clusterware sub-cluster is a common infrastructure for all Oracle RAC databases configured at the site. Any number of RAC databases local to a site can be created and registered with the Oracle clusterware sub-cluster of the site. Overview of Site Aware Disaster Tolerant Architecture 325 NOTE: The Oracle clusterware sub-cluster functionality is supported only in a Metrocluster environment that uses SGeRAC and requires the underlying Serviceguard cluster configuration to have the sites defined appropriately. Cluster File System Sub-cluster The Cluster File System is layered above the Serviceguard membership. The Cluster File System (CFS) sub-cluster is formed with membership from the cluster nodes in a site as defined in the underlying cluster. The CFS sub-cluster at a site manages the disks connected to the nodes in that site. If you have CVM or CFS configured in your environment, a CFS sub-cluster must be present at each site. The CFS sub-clusters at each site have their own namespaces. As a result, the same Cluster Volume Manager (CVM) disk group name can be used in both the sites in a Metrocluster. When the sites are defined in the Serviceguard cluster configuration, the CFS sub-clusters are formed automatically when the cluster starts. The CFS sub-cluster daemons are packaged in the regular cluster-wide System Multi Node package (SMNP): SG-CFS-pkg. The SG-CFS-pkg is a single SMNP package configured across both the site nodes in a Metrocluster. The SMNP package instances form or join the corresponding site CFS sub-cluster automatically. The SG SMS commands, CFS commands, and other utilities operate within the CFS sub-cluster, where they are executed. The CFS sub-clusters manage the cluster file systems on disk arrays that are local to a site. NOTE: The CFS sub-cluster functionality is supported only in a Metrocluster environment with sites defined in the underlying cluster. IMPORTANT: • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. Complex Workload Packages A complex workload is a multi-instance application that uses active resources across multiple nodes in a cluster. These workloads are configured using multiple, inter-dependent multi-node packages in Serviceguard. The workloads need to be managed and moved collectively for disaster tolerance. For SADTA, the complex workload is configured at each site sub-cluster. The configuration at each site is packaged using different MNP packages. 326 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Figure 7–1 is an example of a complex workload comprising an Oracle RAC database MNP and its corresponding CFS Mount Point and Disk Group MNP packages. The arrows in the figure indicate the package dependencies. Figure 7-1 Complex Workload with Package Dependencies Configured Site Controller Package The Site Controller package is a Serviceguard failover package. It provides coordinated automatic site failover for the configured complex workload. The inter-dependent packages of the complex workload at each site is configured with the Site Controller package. This Site Controller package starts the workload packages on the site that it is running on and monitors them. When the monitored packages fail, or if the site hosting the packages fail, the Site Controller package fails over to the remote adoptive site node and performs a site failover. As part of the failover, the Site Controller package ensures that the workload packages on the failed site are down and then starts the corresponding package on the other site. This results in a failover of the complex workloads across sites. The Site Controller package is created using a Serviceguard modular package module: /dts/sc. To create a Site Controller package, the following conditions must be met: 1. 2. The AUTO_RUN variable is set to NO. The NODE_NAME variable is specified explicitly with * which indicates all node names in the Metrocluster. The NODE_NAME variable must be specified in an order where all the nodes of the preferred site appear before the remote adoptive site nodes. Overview of Site Aware Disaster Tolerant Architecture 327 The Site Controller package can manage only MNP type packages. The MNP packages of all workloads at all sites is specified in the Site Controller package, and are grouped by site. Use the managed_package and critical_package attributes under the site attribute to list the workload packages in that site. Configure the site attribute using values that match the site definition in the underlying Serviceguard cluster configuration file. Multiple managed packages and only one critical package can be configured under each site attribute. When the Site Controller package is started on a site, it starts all the workload packages configured with it, on that site. The Site Controller package then monitors the workload package that is configured as a critical_package on the site. When no critical_package is configured, the Site Controller package monitors all the configured managed packages. The Site Controller package initiates a failover when all the nodes in the current site have failed, or when the monitored packages have failed and the following conditions are met: 1. 2. All monitored packages are down, having failed in the cluster. All instances of all monitored packages have halted clean. This implies that the halt scripts have been successfully executed for all instances for all monitored packages. The Site Controller package initiates a site failover only when the workload has failed. So, manually halting workload packages does not initiate a site failover. When a monitored MNP package instance has not halted successfully, stray resources could still be online. The node switching capability of the MNP package instance on that node is disabled. The node cannot be removed from the cluster using the cmhaltnode or cmhaltcl commands. Operators or administrators must clean any stray resources of the failed MNP package instance; and enable node switching to restart the workload packages on the site. Table 7–1 describes the packages that are monitored by the Site Controller package and instances when site failover is initiated. Table 7-1 Packages Monitored by the Site Controller Package Site Controller Package Configuration What is Monitored critical_package configured Only the package configured as critical_package critical_package is not configured 328 All packages configured as managed_package Site Failover is Initiated When the configured critical_package meets condition 1 and 2 When all the configured managed_package meets condition 1 and 2. Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture To perform a site failover, the Site Controller package first fails over to a node in the adoptive site and then initiates a site failover as part of its own startup. Before initiating a site failover, the Site Controller package first ensures that the nodes of the failed site are no longer in the cluster or all the workload packages at the failed site are down and halted clean. After ensuring these conditions, the Site Controller package prepares the replicated disks at the adoptive site and then starts the workload packages. If there are any errors in this sequence of steps, the Site Controller package fails at startup at the adoptive site. Following is a sample of a typical disaster tolerant RAC database that is configured in its Site Controller package configuration file: site san_francisco critical_package sfo_hrdb # RAC MNP at San Francisco site managed_package sfo_hrdb_mp # CFS MP MNP at San Francisco site managed_package sfo_hrdb_dg # CVM DG MNP at San Francisco site site san_jose critical_package sjc_hrdb # RAC MNP at San Jose site managed_package sjc_hrdb_mp # CFS MP MNP at San Jose site managed_package sjc_hrdb_dg # CVM DG MNP at San Jose site In this example, the Site Controller package initiates a site failover to the san_jose site, when the package configured as the critical_package, in this case sfo_hrdb, is down after failing on the site. When there is no critical package configured; or when all packages of the workload are configured as managed packages, the Site Controller package initiates a site failover only when all the workload packages on the site have failed. Following is an example of a Site Controller package configuration file where all the packages in the workload are configured as a managed_package. site san_francisco managed_package sfo_hrdb # RAC MNP at San Francisco site managed_package sfo_hrdb_mp # CFS MP MNP at San Francisco site managed_package sfo_hrdb_dg # CVM DG MNP at San Francisco site site san_jose managed_package managed_package managed_package sjc_hrdb # RAC MNP at San Jose site sjc_hrdb_mp # CFS MP MNP at San Jose site sjc_hrdb_dg # CVM DG MNP at San Jose site In this example, the Site Controller package initiates a site failover to the san_jose site, when all the configured managed packages, in this case, sfo_hrdb, sfo_hrdb_mp, and sfo_hrdb_dg, are down, having failed in the cluster. In both cases, the Site Controller package, from the adoptive site node, ensures that all the workload packages on the failed site, including sfo_hrdb, sfo_hrdb_mp, and sfo_hrdb_dg, are down and have halted clean. When a critical package is configured and if any of the managed packages is not down, the Site Controller package halts them. Overview of Site Aware Disaster Tolerant Architecture 329 Following are the other attributes that are required to be configured in a Site Controller package: • monitor_interval This attribute specifies the time interval, in seconds, at which the Site Controller package monitors the workload package status. The default value is 30 seconds. Values lesser than 30 seconds cause the Site Controller package to check the monitor package status more frequently. • dts_pkg_dir This attribute specifies the absolute path to the Site Controller package directory. This directory must be present in all the nodes in the Metrocluster. The Site Controller package looks for the Metrocluster environment file in this directory. All Metrocluster flag files should be touched under this directory. In SADTA, each configured complex workload has its own Site Controller package. Any number of Site Controller packages can be configured in a Metrocluster. So, any number of complex workloads can be configured in a Metrocluster. Site Safety Latch The Site Safety Latch prevents inadvertent simultaneous startup of the workload configuration on both sites. The Site Safety Latch is an internal mechanism that is created for each Site Controller package. It is created automatically on all the cluster nodes when the corresponding Site Controller package configuration is applied in the cluster. The Site Safety Latch for a Site Controller package is identified by the following convention: /dts/mcsc/. A workload that is managed by a Site Controller package, must be configured to use the corresponding Site Safety Latch. Packages that are the foremost predecessors in the dependency order among the workload packages, must be configured with a resource dependency on the Site Safety Latch. In addition, workload packages that are not dependent on any other workload packages must also be configured with a resource dependency on the Site Safety Latch. The resource dependency must be specified in the package configuration file. The UP value of the RESOURCE_UP_VALUE attribute must be specified as != DOWN and the value for the RESOURCE_START attribute must be left at its default value of Automatic in the configuration file. Following is a sample of the resource dependency specification: RESOURCE_NAME /dts/mcsc/hrdb_sc RESOURCE_POLLING_INTERVAL 40 RESOURCE_UP_VALUE != DOWN RESOURCE_START = AUTOMATIC Based on the Site Safety Latch configuration rules, when configuring an Oracle Database 10gR2 RAC using CFS or CVM storage in a Metrocluster, the resource dependency 330 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture must be defined in the MNP package of each CVM disk group that is used to store the RAC database. When using SLVM for storage, the resource dependency must be specified in the SGeRAC Toolkit RAC MNP package. The Site Safety Latch can either be in the Open or Closed state. The workload packages on a site can be started only when the Site Safety Latch is Open on the site. When the Site Safety Latch on a site is Closed, the corresponding workload packages cannot be started. Each Site Controller package manages its own Site Safety Latch such that only one Site Safety Latch is open at any given time. When the Site Controller package encounters an error while starting the RAC MNP stack packages configured on that site, the Site Safety Latch is left in an INTERMEDIATE state. When the Site Safety Latch is in this state, the Site Controller package and the workload packages on the site can be restarted only after cleaning the site. For more information on cleaning the Site Controller package, see “Cleaning the Site to Restart the Site Controller Package” (page 376). IMPORTANT: The Site Safety Latch is an internal mechanism, which is opened and closed automatically by its corresponding Site Controller package. Operators only need to configure the workload packages to use the Site Safety Latch. It need not be managed manually. Overview of SADTA Configuration A complex workload is configured redundantly by configuring it at each site sub-cluster. A Site Controller package is created to manage the workload which automatically creates the corresponding Site Safety Latch. The workload packages at each site are configured with the Site Controller package and the Site Safety Latch is configured with the appropriate package in the workload. Starting the Site Controller package starts the workload packages on the site. The Site Controller package then monitors the workload package as specified in its configuration. When a failure occurs that disrupts the application availability or when a disaster occurs that impacts the whole site, the Site Controller package fails over to an adoptive site node and starts the corresponding workload packages at the site of the adoptive node. The workload packages can be shutdown by halting the corresponding Site Controller package. Figure 7–2 describes two Oracle Database 10gR2 RAC configured in a SADTA Metrocluster. Each database workload has its own Site Controller package and Site Safety Latch. Overview of SADTA Configuration 331 Figure 7-2 Package View SADTA and Oracle Database 10gR2 RAC The Oracle RAC database can be deployed in a Metrocluster environment for disaster tolerance using SADTA. In this architecture, a disaster tolerant RAC database can be configured as two RAC databases that are replicas of each other; one at each site of the Metrocluster. At any given time in a Metrocluster, the RAC database at only one site is up and actively services clients while the other RAC database, which is a replica on the remote site, remains passive. The active RAC database data I/O is continuously replicated (synchronously or asynchronously) to the remote site using physical data replication technologies, such as Continuous Access with EVA, Continuous Access XP, or EMC SRDF. When the active RAC database fails, or when the site hosting the active RAC database is lost in a disaster, Metrocluster automatically initiates a site failover for the RAC database. A Metrocluster site failover activates the passive database configuration at the remote site by starting it using the replicated data in that site. After a successful site failover, the redundant RAC passive database becomes the new active RAC database. While configuring replicas of the RAC databases within a Metrocluster, it is important that the two sites in the Serviceguard cluster are configured by grouping cluster nodes based on the sites they are located in. If you have CVM or CFS configured in your 332 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture environment, an SGeRAC-enabled Oracle Clusterware sub-cluster and a Serviceguard Storage Management Suite CFS sub-cluster must be created at each site. Only nodes from the site on which the sub-clusters are created can be members of these clusters. For every site aware disaster tolerant RAC database, a RAC database must be configured at each site using the Oracle clusterware sub-cluster of the site. The database at each site uses the CFS sub-cluster file systems created over the local disk of a replicated disk pair. The RAC database processes, the disk groups, and file systems at each site are configured in a stack of inter-dependent MNP packages. The RAC database processes are packaged using the SGeRAC Toolkit (delivered as part of the SGeRAC product). The CVM DG MNP and CFS MP MNP packages are created using SG SMS commands for the disk groups and cluster file systems that are used to store the database. For more information on using SG SMS commands to create disk groups and cluster file systems, see the VERITAS Storage Foundation Cluster File System HP Serviceguard Storage Management Suite Extracts document available at http://docs.hp.com. In addition, check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest release notes for your version of Serviceguard for up-to-date information about support for CVM and CFS, available at: http://.docs.hp.com -> High Availability -> Serviceguard. IMPORTANT: • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. For SADTA, a Site Controller package must be configured to provide robust site failover semantics for a site aware disaster tolerant RAC database. The Site Controller package starts the configured local RAC MNP stack packages on the site where it is started. The Site Controller package monitors the started RAC MNP stack packages. When these packages fail, the Site Controller package fails over to the remote site. As part of its startup on the remote site node during failover, the Site Controller package prepares the replicated data storage and runs the passive RAC MNP stack packages in the remote site ensuring disaster tolerance for the database. Since a disaster tolerant RAC database has two identical but independent RAC databases configured over the replicated storage in a Metrocluster, it is important to prevent packages of both the site RAC MNP stack to be up and running simultaneously. If the packages of the redundant stack at both sites are running simultaneously, it leads to data corruption. SADTA provides a Site Safety Latch mechanism at the site nodes that prevents inadvertent simultaneous direct startup of the RAC MNP stack packages at both sites. SADTA and Oracle Database 10gR2 RAC 333 Multiple site aware disaster tolerant RAC databases can be configured in a Metrocluster. Figure 7–3 shows one such configuration with two site aware disaster tolerant RAC databases: hrdb and salesdb. Multiple RAC databases can be configured using a separate Site Controller package infrastructure for each RAC database. Each RAC database must have its own Site Controller package, Site Safety Latch, RAC MNP package stack, and replication disk group in the Metrocluster. The site-specific Oracle clusterware and CFS sub-clusters are common resources for all RAC databases. To add another RAC database to an existing Metrocluster, two replicas of the RAC database and the RAC MNP stack packages on both sites, in a separate Site Controller package, must be configured. Because the SADTA configuration requires two replicas of the RAC database configuration, the Oracle Network and Services must be configured accordingly for the disaster tolerant database clients to automatically reconnect to the new active site after a site failover is complete. For more information on configuring access for Oracle Database 10gR2 RAC, see “Configuring Client Access for Oracle Database 10gR2 RAC” (page 359). Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture This section describes how to configure Oracle Database 10gR2 RAC in a SADTA. This section addresses the following topics: • • • • • • • • • • • • • • 334 “Summary of Required Procedures” (page 335) “Sample Configuration” (page 337) “Configuring SADTA” (page 340) “Setting up Replication” (page 341) “Configuring Metrocluster” (page 341) “Installing and Configuring Oracle Cluster Ready Service (CRS)” (page 343) “Installing and Configuring Oracle Real Application Clusters (RAC)” (page 348) “Creating the RAC Database ” (page 348) “Creating Identical RAC Database at the Remote Site ” (page 352) “Configuring the Site Controller Package” (page 355) “Configuring the Site Safety Latch Dependencies” (page 356) “Starting the Disaster Tolerant RAC Database in the Metrocluster” (page 358) “Configuring Client Access for Oracle Database 10gR2 RAC” (page 359) “Configuring SGeRAC Cluster Interconnect Subnet Monitoring” (page 360) Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Summary of Required Procedures This section elaborates on the procedures required to configure Oracle Database 10gR2 RAC in a SADTA. To set up SADTA in your environment, you must perform the following steps: 1. 2. 3. Set up replication in your environment. Based on the existing arrays in your environment, see the respective chapters of this manual to set up replication. Install software for configuring Metrocluster. This includes: a. Creating Serviceguard Clusters b. Configuring Cluster File System Multi Node Package (MNP) Install Oracle. a. Install and configure Oracle Cluster Ready Service (Oracle Clusterware). b. Install and configure Oracle Real Application Clusters (RAC). c. Create RAC databases. d. Create identical RAC Databases at the remote site. Checklist for Configuring SADTA Use the following checklist while configuring SADTA in your environment. CAUTION: This checklist does not include detailed procedural information. Before installing any of the required software products, see the Disaster Tolerant Clusters Products Compatibility Feature Matrix available at http://docs.hp.com. 1. Install Serviceguard. The Serviceguard version must be compatible with the version of HP-UX installed. See the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix, at http:// docs.hp.com -> High Availability -> Serviceguard. 2. Install Serviceguard extension for Oracle RAC. Install SGeRAC with CFS to use CFS/CVM. IMPORTANT: • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. 3. 4. Install the required SG and SGeRAC patches. For more information on the required Serviceguard and SGeRAC patches, see the Disaster Tolerant Clusters Products Compatibility Feature Matrix available at http://docs.hp.com. Install Metrocluster software. Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 335 5. Set up replication between sites. The replication mechanism varies depending on the arrays that are configured in the environment. See the respective chapters of this manual for setting up replication based on the arrays in your environment. 6. 7. 8. Create Serviceguard clusters, with sites defined in the cluster configuration file. Create the SG-CFS-SMNP, if CFS/CVM is configured in your environment. Install and configure Oracle Cluster Ready Service (Oracle Clusterware) a. Configure the Network b. Configure the Storage Device for Installing Oracle CRS c. Set up the CRS and VOTING directories d. Install the Oracle CRS software e. Configure SGeRAC Toolkit packages for the site CRS sub-cluster 9. Install and Configure Oracle RAC 10. Create the RAC Database. a. Set up file systems for RAC database data files. The RAC database can be configured to use CVM or SLVM raw volumes, or CFS file systems. If using CFS, create SG-CFS-MP packages. If using CVM, create SG-CFS-DG packages. If using SLVM, create appropriate SLVM volume groups with required raw volumes over the replicated disks. IMPORTANT: • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. b. Set up file systems for RAC database flash recovery. If you have SLVM, CVM, or CFS configured in your environment, see the following documents available at http://docs.hp.com: Architecture Considerations and Best Practices for Architecting an Oracle 10g R2 RAC Solution with Serviceguard and SGeRAC Using Serviceguard Extension for RAC c. Create the RAC database using Oracle DBCA. d. Configure and test RAC MNP Stack at the local site. e. Halt the RAC database. 11. Create the identical RAC database at the remote site. 336 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture a. Configure the replica RAC database b. Configure the RAC MNP at the remote site. c. Halt the RAC database at the remote site. 12. 13. 14. 15. 16. Configure the Site Controller Package. Configure the Site Safety Latch Dependencies. Start the site aware disaster tolerant RAC database in the Metrocluster. Configure client access for the RAC database. Configure SGeRAC cluster interconnect subnet monitoring. The subsequent sections explain each of these steps in detail. Sample Configuration To illustrate the configuration procedures for SADTA, the subsequent sections describe how to install and configure a site aware disaster tolerant Oracle Database 10gR2 RAC in a Metrocluster. The configuration procedure involves multiple steps across multiple nodes. Use the worksheet in Appendix H to document key information that will be required during the solution configuration. Following is a sample configuration that will be used in subsequent sections to elaborate on the procedure to configure SADTA with Oracle Database 10gR2 RAC. In this sample configuration, following are the names that are used: • hrdb This is the Oracle Database 10gR2 RAC that is configured with two database instances, which is configured using SADTA in a Metrocluster environment. • dbcluster The Metrocluster that spans two cities, San Francisco and San Jose. • SFO_1 and SFO_2 The two nodes at the San Francisco site that are connected to a disk array that supports the SADTA feature. • SJC_1 and SJC_2 The two nodes at the San Jose site that are connected to a disk array that supports the SADTA feature. The disk arrays at the San Francisco and the San Jose sites have a physical replication link configured between them. The underlying Serviceguard cluster is configured in a cross-subnet environment and the two sites are defined in the configuration file as follows: SITE_NAME SITE_NAME san_francisco san_jose Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 337 The RAC database is created on a shared disk in an XP disk array, which is synchronously replicated as part of a single replication disk group, hrdb_devgroup. The configuration uses the CFS file system at the host for database storage management. Because the underlying Serviceguard cluster is configured with the site, there are two CFS sub-clusters: one at the San Francisco site with membership from SFO_1 and SFO_2 nodes and the other at the San Jose site with membership from SJC_1 and SJC_2 nodes. Figure 7-3 Sample Configuration To configure SADTA, two CRS sub-clusters; one at the San Francisco site and the other at the San Jose site, must be created. The Oracle Clusterware software must be installed at each site in the Metrocluster. The CRS daemons at the sub-clusters must be configured as a Serviceguard package using the SGeRAC toolkit. The CRS Home is installed on a file system that is local to a site. The CRS voting and OCR disks must not be configured for replication. Table 7–2 lists the CRS packages and other resources that form the CRS sub-cluster at each site. Table 7-2 CRS Sub-clusters configuration in the Metrocluster Site San Francisco San Jose sfo_crs sjc_crs Members SFO_1 and SFO_2 SJC_1 and SJC_2 CRS MNP sfo_crs sjc_crs sfo_crs_ic sjc_crs_ic CRS Cluster Name CRS Interconnect MNP CRS HOME 338 /opt/crs/oracle/product/10.2.0/crs /opt/crs/oracle/product/10.2.0/crs Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Table 7-2 CRS Sub-clusters configuration in the Metrocluster (continued) Site San Francisco San Jose /cfs/sfo_crs/OCR/ocr /cfs/sjc_crs/OCR/ocr CRS Voting Disk /cfs/sfo_crs/VOTE/vote /cfs/sjc_crs/VOTE/vote CRS mount point /cfs/sfo_crs /cfs/sjc_crs CRS MP MNP package sfo_crs_mp sjc_crs_mp CRS DG MNP package sfo_crs_dg sjc_crs_mp CVM DG Name sfo_crsdg sjc_crsdg CRS OCR Private IPs Virtual IPs 192.1.7.1 SFO_1p.hp.com 192.1.8.1 SJC_1p.hp.com 192.1.7.2 SFO_2p.hp.com 192.1.8.2 SJC_2p.hp.com 16.89.140.202 SFO_1v.hp.com 16.89.141.202 SJC_1v.hp.com 16.89.140.204 SFO_2v.hp.com 16.89.141.204 SJC_2v.hp.com In this example, two replicas of the RAC database need to be configured; one at San Francisco and the other at San Jose. The database must be created at the nodes in the San Francisco site and the configuration and data must be replicated to the nodes in the San Jose site. The RAC database software must be installed on the local file system at each node. The database uses two CFS file systems for database files; one for the database data and the other for flash recovery area. The RAC database must be configured using the SGeRAC toolkit at each site. The disk group names and file systems mount point paths must be the same for both the site databases. However, the disk groups and mount points are packaged using different packages at each site CFS sub-cluster. Table 7–3 lists the packages and other resources at each site. Table 7-3 Sample database configuration Site Details San Francisco San Jose RAC HRDB RAID Device group name hrdb_devgroup hrdb_devgroup RAC HRDB data files Disk Group name hrdbdg hrdbdg RAC flash area CVM Disk Group name flashdg flashdg HRDB HRDB hrdb1@ SFO_1 hrdb1@ SJC_1 hrdb2@ SFO_2 hrdb2@ SJC_2 RAC Database Name RAC Instances Instance @ Node RAC Home /opt/app/oracle/product/10.2.0/db /opt/app/oracle/product/10.2.0/db Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 339 Table 7-3 Sample database configuration (continued) Site Details San Francisco San Jose RAC MNP package name sfo_hrdb sjc_hrdb RAC HRDB data mount point /cfs/rac/ /cfs/rac/ RAC HRDB data MP MNP sfo_hrdb_mp sjc_hrdb_mp RAC HRDB data DG MNP sfo_hrdb_dg sjc_hrdb_dg RAC flash area mount point /cfs/flash /cfs/flash RAC flash area MP MNP sfo_flash_mp sjc_flash_mp RAC flash area DG MNP sfo_flash_dg sjc_flash_dg In this example, a Site Controller package titled hrdb_sc must be created to provide automatic site failover for the hrdb RAC database between San Francisco and San Jose. The RAC database MNP packages must be configured using the critical_package attribute, and the CFS MP MNP and CVM DG MNP database packages must be configured using the managed_package attribute. As a result, the Site Controller package monitors only the RAC database MNP package and initiates a site failover when it fails. The Site Controller package can be configured to monitor all the RAC MNP stack packages and initiate a site failover only when all the packages in the stack have failed or the site itself is lost in a disaster. In such a scenario, all RAC MNP stack packages must be configured with the managed_package attribute and no package must be configured with the critical_package attribute in the Site Controller package configuration file. Configuring SADTA This section describes the procedures that need to be followed to configure SADTA with Oracle Database 10gR2 RAC. To configure SADTA with Oracle Database 10gR2 RAC, complete the following steps: 1. Set up Replication. 2. Configure Metrocluster. 3. Install and Configure Oracle Cluster Ready Service (CRS). 4. Install and Configure Oracle Real Application Clusters (RAC). 5. Create the RAC Database. 6. Create the identical RAC Database at the remote site. 7. Configure the Site Controller package and the Site Safety Latch. 8. Configure client access for Oracle Database 10gR2 RAC. The subsequent sections discuss each step in detail. 340 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Setting up Replication The RAC database data files and the flash recovery area should be replicated between the site disk arrays. The underlying disks must be configured for replication. The replication mechanisms differ depending on the type of arrays in your environment. For more information on configuring replication for the arrays in your environment, see the respective chapters of this manual. Configuring Metrocluster To configure SADTA, a Serviceguard cluster must be created that comprises nodes from both sites. In this example, a Serviceguard cluster is created using nodes SFO_1, SFO_2, SJC_1, and SJC_2. This example also describes a cross subnet Serviceguard cluster. There are rules and guidelines to configure a cross subnet cluster. For more information on configuring cross-subnet Serviceguard clusters, see Chapter 1 (page 25). To configure Metrocluster, complete the following steps: 1. Create a Serviceguard cluster with the sites configured. 2. Configure the Cluster File System Multi Node Package (SMNP). The following sections describe each of these steps in detail. Creating a Serviceguard Cluster with Sites Configured Complete the following steps to create a Serviceguard cluster with sites configured: 1. Run the following command to create a cluster configuration file from any node: cmquerycl In this example, the command is: cmquerycl -v -C /etc/cmcluster/dbcluster.config -n SFO_1 -n\ SFO_2 -n SJC_1 -n SJC_2 -w full -q quorum.abc.com where quorum.abc.com is the host name of the Quorum Server. 2. Edit the /etc/cmcluster/dbcluster.config file to specify the site configuration. Following is a sample of the configuration file: SITE_NAME san_francisco SITE_NAME san_jose NODE_NAME sfo_1 SITE san_francisco NETWORK_INTERFACE lan2 #SG HB 1 HEARTBEAT_IP 192.1.3.1 NETWORK_INTERFACE lan3 #SG HB 2 HEARTBEAT_IP 192.1.5.1 NETWORK_INTERFACE lan4 #SFO_CRS CSS HB STATIONARY_IP 192.1.7.1 NETWORK_INTERFACE lan5 #SFO_CRS CSS HB standby Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 341 NETWORK_INTERFACE STATIONARY_IP NETWORK_INTERFACE lan1 # SFO client access 16.89.140.201 lan6 # SFO client access standby NODE_NAME sfo_2 SITE san_francisco NETWORK_INTERFACE lan2 #SG HB 1 HEARTBEAT_IP 192.1.3.2 NETWORK_INTERFACE lan3 #SG HB 2 HEARTBEAT_IP 192.1.5.2 NETWORK_INTERFACE lan4 # SFO_CRS CSS HB STATIONARY_IP 192.1.7.2 NETWORK_INTERFACE lan5 # SFO_CRS CSS HB standby NETWORK_INTERFACE lan1 # SFO client access STATIONARY_IP 16.89.140.203 NETWORK_INTERFACE lan6 # SFO client access standby NODE_NAME sjc_1 SITE san_jose NETWORK_INTERFACE lan2 #SG HB 3 HEARTBEAT_IP 192.1.6.1 NETWORK_INTERFACE lan3 #SG HB 4 HEARTBEAT_IP 192.1.4.1 NETWORK_INTERFACE lan4 #SJC_CRS CSS STATIONARY_IP 192.1.8.1 NETWORK_INTERFACE lan5 #SJC_CRS CSS NETWORK_INTERFACE lan1 # SJC client STATIONARY_IP 16.89.141.201 NETWORK_INTERFACE lan6 # SJC client NODE_NAME SITE NETWORK_INTERFACE HEARTBEAT_IP NETWORK_INTERFACE HEARTBEAT_IP NETWORK_INTERFACE STATIONARY_IP NETWORK_INTERFACE NETWORK_INTERFACE STATIONARY_IP NETWORK_INTERFACE 3. sjc_2 san_jose lan2 #SG HB 3 192.1.6.2 lan3 # SG HB 4 192.1.4.2 lan4 #SJC_CRS CSS 192.1.8.2 lan5 #SJC_CRS CSS lan1 # SJC client 16.89.141.203 lan6 # SJC client HB HB standby access access standby HB HB standby access access standby Run the following command to apply the configuration file: cmapplyconf -v -C /etc/cmcluster/dbcluster.config 4. Run the following command to start the cluster: cmruncl After the cluster is started, you can run the cmviewcl command to view the site configuration. 342 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Configuring the Cluster File System Multi Node Package (SMNP) If Cluster File System (CFS) is configured in the environment, the Serviceguard CFS (SG CFS) package must be configured. To create a SG CFS SMNP package, run the cfscluster command. After creating the package, ensure that there are two CFS sub-clusters in the Metrocluster. Run the following command on any node, at both sites, to view the list of nodes and the status of each node: cfscluster status Following is the output that is displayed: Node : SFO_1 Cluster Manager : up CVM state : up (MASTER) MOUNT POINT TYPE SHARED VOLUME /cfs/crs regular crs_vol Node : SFO_2 Cluster Manager : up CVM state : up MOUNT POINT TYPE SHARED VOLUME /cfs/crs regular crs_vol DISK GROUP crs_dg_siteA DISK GROUP crs_dg_siteA STATUS MOUNTED STATUS MOUNTED IMPORTANT: • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. Installing and Configuring Oracle Cluster Ready Service (CRS) After setting up replication in your environment and configuring the Metrocluster, you must install Oracle Cluster Ready Service (CRS). Use the Oracle Universal Installer to install and configure the CRS cluster. Because SADTA requires two CRS sub-clusters, one at each site, you must install and configure Oracle CRS twice in the Serviceguard cluster. When you install Oracle CRS at a site, the sub-cluster installation is confined to a site. The CRS storage is not replicated. As a result, Oracle CRS must be installed on a local file system at each node in the site. The Oracle Cluster Registry (OCR) and Voting disks must be shared only among the nodes in the site. In this example, CFS is shared between the site nodes for the OCR and Voting disks. Install Oracle CRS, one site at a time. Following sections describe the steps to install and configure the CRS sub-cluster at both sites. Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 343 Configuring the Network Complete the following procedure to configure the network: 1. Identify the Oracle virtual IP and private IP to be used in the site CRS sub-cluster and enter them in the /etc/hosts file on all the nodes in the site. All nodes in the site CRS sub-cluster must be able to resolve the private and virtual IP of all other nodes in the site CRS sub-cluster. In this sample configuration, for the SFO CRS sub-cluster, the following entries must be made in the /etc/hosts file of SFO_1 and SFO_2 nodes. 192.1.7.1 192.1.7.2 16.89.140.202 16.89.140.204 2. SFO_1p.hp.com SFO_2p.hp.com SFO_1v.hp.com SFO_2v.hp.com SFO_1p SFO_2p SFO_1v SFO_2v Configure the appropriate host equivalence for the oracle user. When installing Oracle RAC and the database software, host equivalence for the oracle user must be configured only among the nodes in the same site. Add entries in the .rhosts file of the Oracle user for every network address of nodes in the site. Do not include the nodes in the other site. In this example, following are the entries that are included in the .rhosts file: SFO_1 SFO_2 SFO_1p SFO_2p SFO_1v SFO_2v oracle oracle oracle oracle oracle oracle After installing and configuring all Oracle software, the host equivalence for the oracle user across the sites can be configured. 3. Update the /home/oracle/.profile file for the Oracle user and set the ORACLE_SID environment variable using the RAC database instance name that will run in the node. Following is the .profile file for this example: export ORACLE_BASE=/opt/app/oracle export ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1 export ORA_CRS_HOME=/opt/crs/oracle/product/10.2.0/crs LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib: $ORACLE_HOME/rdbms/lib SHLIB_PATH=$ORACLE_HOME/lib32:$ORACLE_HOME/rdbms/lib32 export LD_LIBRARY_PATH SHLIB_PATH export PATH=$PATH:$ORACLE_HOME/bin:$ORA_CRS_HOME/bin: /usr/local/bin: CLASSPATH=$ORACLE_HOME/jre:$ORACLE_HOME/jlib: $ORACLE_HOME/rdbms/jlib:$ORACLE_HOME/network/jlib export CLASSPATH export ORACLE_SID= 344 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Configuring the Storage Device for Installing Oracle CRS When you install Oracle Clusterware, it is installed on a local file system on the CRS sub-cluster nodes of the site. As a result, you need to complete the following steps on all nodes at the site: 1. Create a directory path for Oracle CRS Home, set an owner, and specify appropriate permissions. 2. Create an Oracle directory to save installation logs, set an owner, and specify appropriate permissions. 3. Create mount points on all nodes in the site for a CFS file system where the CRS sub-cluster OCR and Voting files will be stored. Setting Up CRS OCR and VOTING Directories The shared storage for storing OCR and VOTING data can be configured using SLVM, or CVM, or CFS. When using SLVM or CVM, a separate SLVM volume group or CVM disk groups, with all required raw volumes should be configured using non replicated disks. For more information on using raw devices for OCR and VOTING storage, see the Oracle® Clusterware Installation Guide available at the Oracle documentation site. This CRS storage is however not required to be replicated in SADTA. The current example will use CFS file system for OCR and VOTING. The following procedure describes configuring CFS for OCR and VOTING data. However, this CRS storage is need not be replicated in SADTA. 1. Initialize the disk that will be used for the CFS file system from the CVM master node at the site. /etc/vx/bin/vxdisksetup -i c4t0d3 NOTE: This disk should be non replicated shared disk connected only to the nodes in the CRS sub-cluster site. 2. From the site CVM master node, create the CRS disk group. vxdg –s init sfo_crsdg c4t0d3 3. Create the Serviceguard Disk Group MNP packages for the disk group. cfsdgadm add sfo_crsdg sfo_crs_dg all=sw SFO_1 SFO_2 4. Activate the CVM DG in the site CFS sub-cluster. cfsdgadm activate sfo_crsdg 5. Create a volume for the CRS disk group. vxassist -g sfo_crsdg make crs_vol 500m 6. Create a file system using the created volume. newfs -F vxfs /dev/vx/rdsk/sfo_crsdg/crs_vol Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 345 7. Create Serviceguard Mount Point MNP packages for the clustered file system. cfsmntadm add sfo_crsdg crs_vol /cfs/sfo_crs sfo_crs_mp\ all=rw SFO_1 SFO_2 8. Mount the clustered file system on the site CFS sub-cluster. cfsmount /cfs/sfo_crs 9. Create the CRS OCR directory in the clustered file system. mkdir /cfs/sfo_crs/OCR chmod 755 /sfo_cfs/crs/OCR 10. Create the CRS VOTE directory in the clustered file system. mkdir /cfs/sfo_crs/VOTE chmod 755 /cfs/sfo_crs/VOTE 11. Set oracle as the owner for the CRS directories. chown –R oracle:oinstall /cfs/sfo_crs After setting owners for the OCR and Voting directories, you can install and configure Oracle CRS. Installing and Configuring Oracle CRS This section describes the procedure to install and configure Oracle CRS. Use the Oracle Universal Installer to install Oracle CRS. For information on installing Oracle CRS using the Oracle Universal Installer, see the Oracle Real Application Clusters Installation and Configuration Guide available at the Oracle documentation site. When selecting the nodes for the CRS sub-cluster on a site, select only the nodes configured in under this site in the Serviceguard cluster. The following procedure describes how to install Oracle CRS with the Universal Installer in the sample environment. 1. Ensure that the appropriate host equivalence for the oracle user is configured. When installing Oracle CRS software, host equivalence for oracle user must be configured only among the nodes in the same site. 2. Login with the Oracle credentials on a node in the site. 3. Copy the Oracle CRS installation software to this node. 4. Run the following command to start the Oracle Universal Installer to install the CRS software. \ /clusterware/runInstaller This command starts the Oracle Universal Installer graphical user interface. Ensure that the DISPLAY environment variable is set appropriately. 346 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture 5. 6. Provide appropriate values at each screen till you reach the Specify Cluster Configuration screen. At the Specify Cluster Configuration screen, complete the following steps: a. Select only nodes that belong to the current site. b. Specify the site-specific CRS name as the Cluster Name. In this example, for the SFO CRS sub-cluster, select only the San Francisco site nodes SFO_1 and SFO_2. Do not include any other nodes. Specify the CRS sub-cluster name as sfo_crs. 7. In the Specify Oracle Cluster Registry (OCR) Location screen, select External Redundancy and specify the CFS file system directory if you have an independent backup mechanism for the OCR. To use the internal redundancy feature of Oracle, select Normal Redundancy and specify additional locations. In this example, for the SFO CRS sub-cluster, the location is specified as: /cfs/sfo_crs/OCR/ocr 8. In the Specify Voting Disk Location screen, select External Redundancy and specify the CFS file system directory if you have an independent backup mechanism for the Voting Disk. To use the internal redundancy feature of Oracle, select Normal Redundancy and specify additional locations. In this example, for the SFO CRS sub-cluster, the location is specified as: /cfs/sfo_crs/VOTE/vote 9. Complete the remaining on-screen instructions to complete the installation. Once the installation is complete, you must ensure that Oracle CRS is installed appropriately, and that the CRS sub-cluster is formed. To ensure that Oracle CRS is installed appropriately, check if the /opt/crs/oracle/product/10.2.0/crs/ bin/crsd.bin and /opt/crs/oracle/product/10.2.0/crs/bin/ocssd.bin processes are running on all nodes in the current site. To ensure that the CRS sub-cluster if formed, run the olsnodes -n command. This command lists the nodes of the CRS sub-cluster. In this example, this command lists the nodes of the SFO CRS sub-cluster. olsnodes -n SFO_1 SFO_2 1 2 Configuring SGeRAC Toolkit Packages for the site CRS Sub-cluster To configure SADTA, the CRS daemons must be managed through Serviceguard. As a result, the CRS sub-cluster at the site must be packaged using the SGeRAC toolkit. Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 347 This configuration must be done at both sites in the Metrocluster. For information on configuring the CRS packages, see the Use of Serviceguard Extension for RAC Toolkit with Oracle 10g RAC manual available at http://docs.hp.com. Installing and Configuring Oracle Real Application Clusters (RAC) The Oracle RAC software must be installed twice in the Metrocluster, once at each site. Also, the RAC software must be installed in the local file system in all the nodes in a site. To install Oracle RAC, you can use the Oracle Universal Installer (OUI). After installation, the installer prompts you to create the database. Do not create the database until you install Oracle RAC at both sites. You must create identical RAC databases only after installing RAC at both sites. For information on installing Oracle RAC, see the documents available at the Oracle documentation site. This section describes the high-level steps to install Oracle RAC. Complete the following procedure to install Oracle RAC in the Metrocluster: 1. Start the Oracle Universal Installer from the temporary directory. For example: /database/runInstaller 2. 3. 4. 5. On the Specify Home Details screen, specify the local file system directory where the RAC software will be installed on the site nodes. On the Specify Hardware Cluster Installation Mode screen, select Cluster Installation and select the nodes in addition to the local node. On the Select Configuration Option screen, select the Install Database Software Only option. Create a listener on both nodes of the site using Oracle NETCA. For more information on using NETCA to configure listeners in a CRS cluster, see the Oracle RAC Installation and Configuration user’s guide. After installing Oracle RAC, you must create the RAC database. Creating the RAC Database After installing Oracle RAC, you must create the RAC database from the site which has the source disks of the replication. In this manual, this site is referred to as the local site. The RAC database creation is replicated to the remote site through physical replication and the identical RAC database can be configured on the remote site from the replication target disks. In our example configuration, a database, hrdb, is created from the San Francisco site. This database is replicated to the San Jose site. After the RAC database is created at the San Francisco site, the identical RAC database must be configured at the San Jose site. 348 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture The example hrdb RAC database uses the Cluster File System for storing its data files. The Cluster File System for the RAC data files is created over the replicated disk array disk group. There are two file systems for each RAC database files; one for the database data files and the other for the flash recovery area. The subsequent sections describe the procedures to set up the file systems for RAC database files. The RAC database can also be configured to use CVM or SLVM raw volumes. As a result, appropriate CVM disk groups or SLVM volume groups must be created with required raw volumes over the replicated disks. IMPORTANT: • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. Setting up CFS File Systems for RAC Database Data Files This section describes how to create CFS file systems for RAC database data files. If you have SLVM configured in your environment, then you must create shared LVM volume groups for the RAC database and import them on all the nodes. For more information on creating shared LVM volume groups for the RAC database, see the Using Serviceguard Extension for RAC manual available at http://docs.hp.com. If you have CVM in your environment, then you must configure CVM disk group MNP packages for all the nodes in the site using the cfsdgadm command and then create volumes. For more information on creating CVM disk group MNP packages, see the Serviceguard Extension for Oracle RAC manual available at http://docs.hp.com. IMPORTANT: • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. The following procedure explains the steps to configure the CFS file systems for the example hrdb database. Complete the following procedure on the CFS cluster master node to set up the CFS file systems: 1. Initialize the source disks of the replication pair: /etc/vx/bin/vxdisksetup -i c4t0d1 Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 349 /etc/vx/bin/vxdisksetup -i c4t0d2 2. Create a disk group for the RAC database data files: vxdg –s init hrdbdg c4t0d1 c4t0d2 3. Create Serviceguard Disk Group MNP packages for the disk groups: cfsdgadm add hrdbdg sfo_hrdb_dg all=sw SFO_1 SFO_2 4. Activate the CVM disk group in the local site CFS sub-cluster: cfsdgadm activate hrdbdg 5. Create a volume from the disk group: vxassist -g hrdbdg make rac_vol 4500m 6. Create a file system using the created volume: newfs -F vxfs /dev/vx/rdsk/hrdbdg/rac_vol 7. Create mount points for the RAC database data files and set appropriate permissions: mkdir /cfs chmod 775 /cfs mkdir /cfs/rac 8. Create the Mount Point MNP packages: cfsmntadm add hrdbdg rac_vol /cfs/rac sfo_hrdb_mp all=rw\ SFO_1 SFO_2 9. Mount the cluster file system on the CFS sub-cluster: cfsmount /cfs/rac 10. Create a directory structure for the RAC database data files in the cluster file system. Set proper permission and owners for the directory: chmod 775 /cfs/rac mkdir /cfs/rac/oradata chmod 775 /cfs/rac/oradata chown oracle:oinstall /cfs/rac/oradata Setting up CFS File Systems for RAC Database Flash Recovery This section describes how to create CFS file systems for RAC database flash recovery. If you have SLVM, CVM, or CFS configured in your environment, see the following documents available at http://docs.hp.com: 350 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture • • Architecture Considerations and Best Practices for Architecting an Oracle 10g R2 RAC Solution with Serviceguard and SGeRAC Using Serviceguard Extension for RAC The following procedure explains the steps to configure the CFS file systems for flash recovery for the example hrdb database. Complete the following procedure on the CFS cluster master node to set up the CFS file systems: 1. Initialize the source disks of the replication pair. /etc/vx/bin/vxdisksetup -i c4t0d4 /etc/vx/bin/vxdisksetup -i c4t0d5 2. Create a disk group using the above initialized disks vxdg –s init flashdg c4t0d4 c4t0d5 3. Create Serviceguard Disk Group MNP package for the disk group. cfsdgadm add flashdg sfo_flash_dg all=sw SFO_1 SFO_2 4. Activate the disk group in the site CFS sub-cluster cfsdgadm activate flashdg 5. Create a volume from the entire disk group. vxassist -g flashdg make flash_vol 4500m 6. Create a file system using the above created volume. newfs -F vxfs /dev/vx/rdsk/flashdg/flash_vol 7. Create mount points for the RAC Database flash logs and flash area mkdir /cfs chmod 775 /cfs mkdir /cfs/flash 8. Create Mount Point MNP package for the cluster file system. cfsmntadm add flashdg flash_vol /cfs/flash sfo_flash_mp\ all=rw SFO_1 SFO_2 9. Mount the RAC database flash recovery file system in the site CFS sub-cluster. cfsmount /cfs/flash 10. Create directory structure in the cluster file system for the RAC database flash recovery area chmod 775 /cfs/flash cd /cfs/flash mkdir flash Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 351 chmod 775 flash chown oracle:oinstall flash Creating the RAC Database using the Oracle Database Configuration Assistant After setting up the file systems for the RAC database data files, you must create the RAC database. You can use the Oracle Database Configuration Assistant (DBCA) to create the RAC database. After you login to the DBCA, select the Cluster File System option as the storage mechanism for the database and select the Common Location for all Database Files option to store database files. Configuring and Testing RAC MNP Stack at the Local Disk Site To configure SADTA, the RAC database that is configured on both sites must be managed by Serviceguard. As a result, the RAC database must be packaged in Serviceguard MNP packages. Also, automatic startup of RAC Database instances and services on CRS startup must be disabled. For more information on disabling automatic startup of RAC databases, see the How To Remove CRS Auto Start and Restart for a RAC Instance document available at the Oracle documentation site. For information on configuring the RAC database in the MNP packages, see the HP SGeRAC Toolkit README. Configure the RAC MNP package to have dependency on the site’s CRS sub-cluster MNP package. This step creates the RAC MNP stack at this site that is configured to be managed by the Site Controller package. Before halting the RAC MNP Stack, test the configuration to ensure that the packages are configured appropriately and can be started. Halting the RAC Database on the Local Disk Site After creating the RAC database on the local disk site, you must halt it to replicate it on the target disk site. You need to first halt the RAC MNP stack on the node in the source disk and then use the vxdg deport command to deport the disk groups at the nodes in the replication source disk. Creating Identical RAC Database at the Remote Site In the earlier procedures, the RAC database was created at the site with the source disk of the replication disk group. A RAC MNP stack was also created at the site. Now, an identical RAC database using the target replicated disk must be configured with the RAC MNP stack at the remote site. Prior to creating an identical RAC database at the remote site, you must first prepare the replication environment. The replication setup depends on the type of arrays that are configured in your environment. Based on the arrays in your environment, see the respective chapters of this manual to configure replication. After configuring replication in your environment, configure the replica RAC database. 352 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Configuring the Replica RAC Database Complete the following procedure to configure the replica RAC database. 1. Copy the first RAC database instance pfile from the source site to the target site first RAC database instance node. In this example, copy the RAC database instance pfile from the SFO_1 node to the SJC_1 node. cd /opt/app/oracle/product/10.2.0/db_1/dbs rcp -p inithrdb1.ora SJC_1:$PWD The -p option retains the permissions of the file. 2. Setup the first RAC database instance on the target site. In this example, run the following commands from the SJC_1 node. cd /opt/app/oracle/product/10.2.0/db_1/dbs ln -s /cfs/rac/oradata/hrdb/orapwhrdb orapwhrdb1 chown -h oracle:oinstall orapwhrdb1 chown oracle:oinstall inithrdb1.ora 3. Copy the second RAC database instance pfile from the source site to the target site second RAC database instance node. In this example, copy the RAC database instance pfile from the SFO_2 node to the SJC_2 node. cd /opt/app/oracle/product/10.2.0/db_1/db rcp -p inithrdb2.ora SJC_2:$PWD The -p option retains the permissions of the file. 4. Set up the second RAC database instance on the target site. In this example, run the following commands from the SJC_2 node. cd /opt/app/oracle/product/10.2.0/db_1/dbs ln -s /cfs/rac/oradata/hrdb/orapwhrdb orapwhrdb2 chown oracle:oinstall inithrdb2.ora chown -h oracle:oinstall orapwhrdb2 5. Create the Oracle admin directory. cd /opt/app/oracle rcp -r admin SJC_1:$PWD rcp -r admin SJC_2:$PWD chown -R oracle:oinstall /opt/app/oracle/admin chown -R oracle:oinstall admin Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 353 6. Log in at any of the nodes in the remote site using the oracle user credentials. su – oracle 7. 8. Configure a listener for the database on this site using the Oracle Network Configuration Assistant (NETCA). Copy the tnsnames.ora file from the remote CRS sub-cluster and modify it to fit the local environment. In this example, the file contents would appear as follows: rcp SFO_1:$ORACLE_HOME/network/admin/tnsnames.ora SJC_1:$ORACLE_HOME/network/admin/tnsnames.ora rcp SFO_2:$ORACLE_HOME/network/admin/tnsnames.ora SJC_2:$ORACLE_HOME/network/admin/tnsnames.ora 9. Edit the tnsnames.ora file on the local nodes and modify the HOST = keywords to specify node names of this site. In this example, you must edit the tnsnames.ora file on the local nodes, SJC_1 and SJC_2. 10. Register the database with the CRS sub-cluster on remote site. srvctl add database -d hrdb -o /opt/app/oracle/product/10.2.0/db_1/ srvctl add instance -d hrdb -i hrdb1 -n SJC_1 srvctl add instance -d hrdb -i hrdb2 -n SJC_2 After registering the database with the CRS sub-cluster on the remote site, you can run the srvctl status command to view the health of the database. Configuring the RAC MNP Stack at the Target Disk Site The RAC database must be packaged as Serviceguard MNP packages. You must configure the RAC MNP package to have a dependency on the site CRS sub-cluster MNP package. This step creates the RAC MNP stack at the target site that will be configured to be managed by the Site Controller package. For more information on configuring the RAC database in MNP packages, see the Serviceguard Extension for Oracle RAC toolkit README. Halting the RAC Database on the Target Disk Site You must halt the RAC database on the target disk site so that it can be restarted at the source disk site. Use the cmhaltpkg command to halt the RAC MNP stack on the replication target disk site node. Deport the disk groups at the replication target disk site nodes using the vxdg deport command. 354 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Configuring the Site Controller Package The Site Controller package and the Site Safety Latch for the disaster tolerant RAC database are the final two components to be configured. This section describes the procedure to configure the Site Controller package in a Metrocluster. Complete the following procedure to configure the Site Controller package: 1. Create the Site Controller package directory on all nodes at both sites: mkdir -m 755 /etc/cmcluster/hrdb_sc 2. From any node, create a Site Controller package configuration file using the dts/ sc module: cmmakepkg -m dts/sc /etc/cmcluster/hrdb_sc/hrdb_sc.config 3. Edit the hrdb_sc.config file and specify a name for the package_name attribute: package_name hrdb_sc 4. Edit the hrdb_sc.config file and specify the node_name parameter explicitly. You must list the nodes from one site followed by the other site: node_name SFO_1 node_name SFO_2 node_name SJC_1 node_name SJC_2 5. Edit the hrdb_sc.config file and specify the directory created in step 1 for the dts_pkg_dir attribute. dts_pkg_dir /etc/cmcluster/hrdb_sc 6. Specify a name for the log file. It is recommended that this file be located in the dts_pkg_dir directory. script_log_file /etc/cmcluster/hrdb_sc/hrdb_sc.log 7. Specify the sites without any packages. Do not specify any critical_package or managed_package. site san_francisco site san_jose 8. Create and distribute the Metrocluster environment file from one node in the cluster. Copy the template file to the Site Controller package directory. Edit the Metrocluster environment file to match your environment. cp /opt/cmcluster/toolkit/SGCA/xpca.env\ /etc/cmcluster/hrdb_sc/hrdb_sc_xpca.env Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 355 This command illustrates copying the template file for Metrocluster with Continuous Access XP. 9. Distribute the environment file to all nodes under the Site Controller package directory. From the SFO_1 node: rcp /etc/cmcluster/hrdb_sc/hrdb_sc_xpca.env SFO_2:\ /etc/cmcluster/hrdb_sc/. rcp /etc/cmcluster/hrdb_sc/hrdb_sc_xpca.env SJC_1:\ /etc/cmcluster/hrdb_sc/ rcp /etc/cmcluster/hrdb_sc/hrdb_sc_xpca.env SJC_2:\ /etc/cmcluster/hrdb_sc/ 10. Apply the empty Site Controller package configuration file. Ensure there are no critical_package or managed_package configured in the Site Controller package configuration file. cmapplyconf -P /etc/cmcluster/sc/hrdb_sc.config When the Site Controller package configuration is applied, the corresponding Site Safety Latch is also configured automatically in the cluster. Use the resls command to view the Site Safety Latch resources. resls -q -s /dts/mcsc/hrdb_sc Following is the output that is displayed: /dts/mcsc/hrdb_sc: Resource Instance The current value of the resource is DOWN (0) Configuring the Site Safety Latch Dependencies After the Site Controller package configuration is applied, the corresponding Site Safety Latch is also configured automatically in the cluster. This section describes the procedure to configure the Site Safety Latch dependencies. Complete the following procedure to configure the Site Safety Latch dependencies: 1. If you have CVM or CFS configured in your environment, add the EMS resource dependency to all DG MNP packages in the RAC MNP stack on both sites. Run the following commands from a node on both sites: cfsdgadm add_ems hrdbdg /dts/mcsc/hrdb_sc 40 "!= DOWN" cfsdgadm add_ems flashdg /dts/mcsc/hrdb_sc 40 "!= DOWN" 356 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture NOTE: Specify the condition as != DOWN with a space before the word Down. Ignoring the space will cause the cfsdgadm command to fail. If you have SLVM configured in your environment, add the EMS resource details in RAC database package configuration file. RESOURCE_NAME /dts/mcsc/hrdb_sc RESOURCE_POLLING_INTERVAL 120 RESOURCE_UP_VALUE != DOWN RESOURCE_START = AUTOMATIC You must apply the modified RAC database package configuration using the cmapplyconf command. 2. Verify the Site Safety Latch resource configuration at both sites. If you have CVM or CFS configured in your environment, run the cfsdgadm show command. cfsdgadm show_ems hrdbdg RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_UP_VALUE != DOWN RESOURCE_START = AUTOMATIC cfsdgadm show_ems flashdg RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_UP_VALUE != DOWN RESOURCE_START = AUTOMATIC /dts/mcsc/hrdb_sc 40 /dts/mcsc/hrdb_sc 40 If you have SLVM configured in your environment, run the following command to view the EMS resource details: cmviewcl -v –p 3. Configure the Site Controller package with both site RAC MNP stack packages. site san_francisco critical_package sfo_hrdb managed_package sfo_hrdb_dg managed_package sfo_hrdb_mp managed_package sfo_flash_dg managed_package sfo_flash_mp site san_jose critical_package sjc_hrdb managed_package sjc_hrdb_dg managed_package sjc_hrdb_mp Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 357 managed_package managed_package 4. sjc_flash_dg sjc_flash_mp Re-apply the Site Controller package configuration. cmapplyconf -v -P /etc/cmcluster/hrdb_sc/hrdb_sc.config After applying the Site Controller package configuration, you can run the cmviewcl command to view the packages that are configured. Starting the Disaster Tolerant RAC Database in the Metrocluster At this point, you have completed configuring SADTA in your environment with the Oracle Database 10gR2 RAC. This section describes the procedure to start the disaster tolerant RAC database in the Metrocluster. Complete the following procedure to start the disaster tolerant RAC database: 1. Run the cmviewcl command to view the disaster tolerant RAC database configuration in a Metrocluster. Following is a sample output: cmviewcl CLUSTER dbcluster SITE_NAME NODE SFO_1 SFO_2 STATUS up san_francisco STATUS STATE up running up running SITE_NAME NODE SJC_1 SJC_2 san_jose STATUS up up MULTI_NODE_PACKAGES PACKAGE STATUS SG-CFS-pkg up sfo_crs_dg up sfo_crs_mp up sfo_crs up sjc_crs_dg up sjc_crs_mp up sjc_crs up sfo_hrdb_dg down sfo_hrdb_mp down sjc_hrdb_dg down sjc_hrdb_mp down sfo_flash_dg up sfo_flash_mp up sjc_flash_dg up sjc_flash_mp up 358 STATE running running STATE running running running running running running running halted halted halted halted running running running running AUTO_RUN enabled enabled enabled enabled enabled enabled enabled enabled enabled enabled enabled enabled enabled enabled enabled Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture SYSTEM yes no no no no no no no no no no no no no no sfo_hrdb sjc_hrdb down down UNOWNED_PACKAGES PACKAGE STATUS hrdb_sc down 2. halted halted STATE halted disabled disabled AUTO_RUN disabled no no NODE unowned Enable all nodes in the Metrocluster for the Site Controller package. cmmodpkg –e –n SFO_1 –n SFO_2 -n SJC_1 –n SJC_2 hrdb_sc 3. Start the Site Controller package cmmodpkg -e hrdb_sc Site controller along with RAC stack should start up on local site (San_Francisco) 4. Check the Site Controller package log file, /etc/cmcluster/hrdb_sc/ hrdb_sc.cntl.log to ensure clean startup. Configuring Client Access for Oracle Database 10gR2 RAC In Oracle Database 10gR2 RAC, the Oracle Clusterware configuration provides Virtual IP addresses (VIPs) through which database clients, external to the cluster, connect to the database. Oracle listeners gather information about service availability on the RAC servers and assist in making client connections to the RAC instances. Additionally, they provide failure notifications and load advisories to clients, thereby enabling fast failover of client connections and client-side load-balancing. These capabilities are facilitated by an Oracle 10g feature called Fast Application Notification (FAN). For more information on Fast Application Notification, see the following documents: http://www.oracle.com/technology/products/database/clustering/pdf/twpracwkldmgmt.pdf http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_ClientFailoverBestPractices.pdf FAN capabilities can be accessed by the client application using the FAN API directly or by using FAN-integrated clients provided by Oracle. The Metrocluster RAC configuration uses 2 Oracle sub-clusters in a single SGeRAC cluster. At any time, a given database is accessed through only one of the Oracle sub-clusters. It is referred to as the active sub-cluster for the database. The client connectivity features mentioned above related to fast failover and load balancing are available from the active sub-cluster. When the database fails over to the other sub-cluster, the same features are available from that sub-cluster. However, the client connections must be made to the VIPs configured in that Oracle sub-cluster. There are several factors that limit the speed of client reconnection when the database fails over across sub-clusters. Following is the sequence of steps that occur when any database fails over: Configuring Oracle Database 10gR2 RAC in a Site Aware Disaster Tolerant Architecture 359 • • • Ensures that the database has been completely shut down at the formerly active sub-cluster. Fails over the disk device group to the newly active sub-cluster so that the database replica LUNs become available for read-write access. Starts the CVM disk groups and CFS mount points for the database at the newly active sub-cluster, and then starts the RAC database there. While these steps are being performed, client connections cannot be made to the database. Also, there is no FAN event delivered to indicate site failover, so existing client connections may be susceptible to delays as long as the TCP keep alive timeout in some cases, before a reconnect is attempted. To automatically reconnect clients to the database on a site failover, the Oracle Net service names must include VIPs configured at both sub-clusters. For example: hr_serv1 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST (ADDRESS = (PROTOCOL = TCP)(HOST (ADDRESS = (PROTOCOL = TCP)(HOST (ADDRESS = (PROTOCOL = TCP)(HOST (LOAD_BALANCE = yes) ) (CONNECT_DATA= (SERVICE_NAME=hr_serv1) ) ) = = = = SFO_1v.hp.com)(PORT SFO_2v.hp.com)(PORT SJC_1v.hp.com)(PORT SJC_2v.hp.com)(PORT = = = = 1521)) 1521)) 1521)) 1521)) Configuring SGeRAC Cluster Interconnect Subnet Monitoring SGeRAC provides a feature to monitor the Oracle Clusterware interconnect subnet and to ensure that at least one RAC instance survives when a failure takes down the entire interconnect subnet in the cluster. To configure this feature, the interconnect subnet must be specified in a separate MNP package using the CLUSTER_INTERCONNECT_SUBNET package attribute. The CRS MNP package for the CRS sub-cluster must have a dependency specified on this interconnect MNP package. For more information on network planning for Oracle clusterware communication, see the Using Serviceguard Extension for RAC manual available at http://docs.hp.com. The Oracle Clusterware interconnect subnet for a site CRS sub-cluster is a subnet spanning only the nodes in that site (it is not required to route it across the sites). The interconnect subnets at each site is packaged in a separate MNP package. Complete the following procedure to configure the SGeRAC Cluster Interconnect packages: 360 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture 1. Create a package directory on all nodes in the site. mkdir -p /etc/cmcluster/pkg/sfo_ic 2. Create a package configuration file and control script file. Use site-specific names for the files. You must follow the legacy package creation steps. cmmakepkg -p sfo_ic.conf cmmakepkg -s sfo_ic.cntl 3. 4. 5. 6. 7. Specify a site-specific package name in the package configuration file. Specify only the nodes in the site for the NODE_NAME parameter. Specify the package type as MULTI_NODE. Specify the SGeRAC Cluster Interconnect as CLUSTER_INTERCONNECT_SUBNET. Save and apply the package configuration file. cmapplyconf –P sfo_ic.conf Configuring and Administration Restrictions Following are the configuration and administration restrictions that apply to SADTA configurations for application workloads: • Only Oracle Database 10gR2 RAC is supported in SADTA. Other workloads are not supported. • Only two sites can be configured in Metrocluster configuration. • All Serviceguard restrictions that apply to site configurations also apply to configuring SADTA. • For a RAC database that is configured in SADTA, the redundant database configuration at each site sub-cluster must have same number of instances. • CFS/CVM support in SADTA requires appropriate version of HP Serviceguard Storage Management Suite software. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. Understanding Site Failover in a Site Aware Disaster Tolerant Architecture This section describes how various site failover scenarios are addressed in SADTA. This section addresses the following topics: • “Node Failure” (page 362) • “Site Failure” (page 362) • “Site Failover ” (page 362) • “Site Controller Package Failure” (page 364) • “Network Partitions Across Sites” (page 364) • “Disk Array and SAN Failure” (page 365) Understanding Site Failover in a Site Aware Disaster Tolerant Architecture 361 • • • • “Replication Link Failure” (page 365) “Oracle Database 10gR2 RAC Failure ” (page 365) “Oracle Database 10gR2 RAC Instance Failure” (page 366) “Oracle Database 10gR2 RAC Oracle Clusterware Daemon Failure” (page 366) Node Failure When a node in a cluster fails, all Multi-node packages (MNP) instances running on the failed node will also fail. The failover type packages failover to the next available adoptive node. If no other adoptive node is configured and available in the cluster, the failover package fails and is halted. When a node in the Metrocluster environment is restarted, the active RAC MNP stack instances on the node are halted before the node restarts. Once the node is restarted and joins a cluster, the active RAC MNP package instances on the site with the AUTO_RUN flag set to YES automatically start. If the RAC MNP stack packages have the AUTO_RUN flag set to NO, the RAC MNP package instances must be manually started on the restarted node. When a node, on which the Site Controller package is running, is restarted, the Site Controller package fails over to the next available adoptive node. Based on the site adoptive node that the Site Controller package is started on and the status of the active RAC MNP stack, the Site Controller package performs a site failover, if necessary. Site Failure A site failure is a scenario where a disaster or an equivalent failure results in all nodes in a site failing or going down. The Serviceguard cluster detects this failure, and reforms the cluster without the nodes from the failed site. The Site Controller package that was running on a node on the failed site fails over to an adoptive node in the remote site. When the remote site starts, the Site Controller package detects that the active RAC MNP stack packages have failed and initiates a site failover by activating the passive RAC MNP stack packages that are configured in the current site. The disaster tolerant RAC databases that have their active RAC MNP stack on the surviving site, where the cluster reformed, continue to run without any interruption. Site Failover When the Site Controller package determines that a running RAC MNP stack of a disaster tolerant RAC database has failed in the Metrocluster, or that the site hosting it has failed, it fails over to the remote site node and initiates a site failover from the remote node. The site failover starts the adoptive RAC MNP stack by starting the RAC MNP stack packages configured on the remote site. The Site Controller package monitors the active RAC MNP stack packages, according to the configuration, to detect a failure and initiate a site failover. When the RAC MNP package is configured using the critical_package attribute, the Site Controller 362 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture package detects and initiates a site failover based on the RAC MNP package status. In a configuration where all the packages in the RAC MNP stack are configured with the managed_package attribute, the Site Controller package detects a failure and initiates site failover based on the cumulative status of all the configured MNP packages. An MNP package that has failed or is halted, in addition to displaying a DOWN state, also displays a Halted status. A special flag package_halted is set to NO when the MNP package is down, having failed in the cluster. This special flag is set to Yes when the MNP package is down and manually halted. Serviceguard sets this flag to NO only when the last surviving instance of the MNP is halted as a result of a failure. The flag is set to Yes if the last surviving instance is manually halted, even if other instances are halted earlier due to failures. The Site Controller package determines a failure by checking if the package_halted flag is set to NO for all monitored MNP packages that are in the DOWN state. When the monitored packages have failed but not halted, the Site Controller package fails over to a remote site node to perform a site failover. Before starting the RAC MNP stack configured at the remote site, the Site Controller package ensures that it is safe to do so. The failed RAC MNP stack packages might not have halted cleanly, leaving stray processes and resources. In such scenarios, it is not safe to start the identical RAC database on the remote site. As a result, when it starts on the remote site node, the Site Controller package checks whether all instances of the failed active RAC MNP stack packages have halted cleanly. The Site Controller package checks the last_halt_failed flag for each instance of the RAC MNP stack packages. The flag is set to TRUE for an instance whose halt script execution resulted in an error. Even if one instance of any of the failed RAC MNP stack package did not halt successfully, the Site Controller package aborts site failover. In these circumstances, the Site Controller package halts and its status is displayed as FAILED on the remote site node. To restart the Site Controller package and RAC databases, the nodes on the site need to be manually cleaned. After ensuring a clean halt for all instances of the failed RAC MNP stack packages, the Site Controller package performs the following steps to activate the corresponding passive RAC MNP stack configured in its current site: 1. Closes the Site Safety Latch for the failed RAC MNP package nodes. 2. Waits for all configured packages as part of the failed RAC MNP stack to halt successfully. 3. Deports the CVM disk groups used by the database on the failed site. 4. Prepares the replicated data storage on the current site using the Metrocluster environment file on the node it is starting. 5. Imports the CVM disk groups used by the database in the current site. 6. Opens the Site Safety Latch in the current site. 7. Starts the RAC MNP stack packages configured for the database in the current site. Understanding Site Failover in a Site Aware Disaster Tolerant Architecture 363 For the Site Controller package to successfully start the remote RAC MNP stack, the packages in the remote MNP stack must have node switching enabled on their configured nodes. When the Site Controller package fails to start after successfully preparing the storage on a site, it sets the Site Safety Latch to a transient state, which is displayed as INTERMEDIATE. When the Site Safety Latch is in the INTERMEDIATE state, the corresponding Site Controller package can be restarted only after cleaning the site where it previously failed to start. For more information on cleaning the Site Controller package, see “Cleaning the Site to Restart the Site Controller Package” (page 376) Site Controller Package Failure The Site Controller package can fail for many reasons, such as node crash, while the active RAC MNP stack on the site is up and running. The Site Controller package fails over to an adoptive node, which can be a node on the same site or a node on the remote site. The Site Controller package behavior is different under each scenario so that the RAC database availability is not disrupted. NOTE: When the adoptive node is a node in the same site, where the current active RAC MNP stack is running, it is considered as a local failover for the Site Controller package. On a Site Controller package local failover, the disaster tolerant RAC database remains uninterrupted on that site. The Site Controller package continues to monitor the managed packages or the critical package on the site, as configured from the current node. When the Site Controller package fails over to an adoptive node at the remote site, it is considered a failover across sites for the Site Controller package. When a Site Controller fails over across sites, while the active the RAC MNP stack is running in the site, the Site Controller package fails on the remote site adoptive node without affecting the running active RAC MNP stack in the cluster. The RAC database continues to be available in the cluster. However, as the Site Controller package has failed in the cluster, the RAC databases can no longer automatically failover to the remote site. Network Partitions Across Sites A network partition across sites is similar to a site failure. The Serviceguard cluster nodes on both sites detect this failure and try to reform the cluster using the Quorum Server. The nodes from only one of the sites will receive the quorum and form the cluster. The nodes on the other site restart and deliberately fail the active RAC MNP stack running on them. The Site Controller package running on the site nodes that failed to form the cluster will now fail over to the adoptive node on the site where the cluster is reformed. When the Site Controller package starts on the adoptive node at the remote site, it detects that the active RAC MNP stack has failed. Consequently, the Site Controller package 364 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture performs a site failover and starts the corresponding RAC MNP stack packages on the site where the cluster has reformed. Disk Array and SAN Failure When a disk array or the host access SAN at a site fails, the active RAC database running on the site could hang or fail based on the component that has failed. If the SAN failure causes the RAC database processes to fail and consequently the RAC MNP stack packages also fail, the Site Controller package initiates a site failover. Replication Link Failure A failure in a replication link between sites stalls the replication from the active RAC MNP stack to the remote site. The impact of a replication link failure on the running RAC database is based on the configured replication mode. On a synchronized replication mode, with fence level set to Data, the primary site disk array starts failing I/Os. This causes the active RAC database to fail. The Site Controller package then performs a site failover, if the RAC MNP package is configured as a critical_package. If the fence level is set to Never, the I/O on the PVOL side is not failed, and the active RAC database continues to run successfully. On an asynchronous replication mode, there is no interruption at the active RAC database and it continues to run uninterrupted. When the RAC database is mounted as read only or is idle or is completing read only transactions when the replication link fails, it may not encounter any failure and continues to be available from the site. Oracle Database 10gR2 RAC Failure When failures, such as tablespace corruption, or errors arising out of insufficient storage space, occur, the RAC database instance processes on the nodes fail. When the Oracle RAC database instance fails at a site, the RAC MNP package instance containing it also fails. The Site Controller package that monitors the RAC MNP package detects that the RAC MNP has failed. The database failure is handled based on the manner in which the RAC MNP stack is configured with the Site Controller package. When the RAC MNP package is configured as a critical_package, the Site Controller package considers only the RAC MNP package status to initiate a site failover. Since the RAC MNP package fails when the contained RAC database fails, the Site Controller package fails over to start on the remote site node and initiates a site failover from the remote site. When the RAC MNP package is configured as a managed_package along with other packages in the stack, such as the CFS MP and CVM DG packages, the Site Controller package considers the status of all configured packages to determine a failure. When the RAC database fails, only the RAC MNP package fails. All other managed packages Understanding Site Failover in a Site Aware Disaster Tolerant Architecture 365 continue to be up and running. As a result, the Site Controller package does not perform a site failover. The Site Controller package only logs a message in the syslog and continues to run on the same node where it was running before the RAC database failed. Manual intervention is required to restart the RAC database MNP package. Oracle Database 10gR2 RAC Instance Failure Certain error conditions in the run time environment of a node can cause the Oracle RAC database instance on the node to fail. This, in turn, causes the corresponding RAC MNP package instance on the node to go down. The RAC MNP package continues to run with one less instance being up and the Site Controller package continues to monitor the RAC MNP stack. However, if the failed RAC database instance is the last surviving instance, the RAC MNP package is halted, after failing in the cluster. The Site Controller package detects the failure and initiates a site failover if the RAC MNP is configured as a critical_package. Oracle Database 10gR2 RAC Oracle Clusterware Daemon Failure The Oracle Clusterware is an essential resource for all RAC databases in a site. When the crsd or evmd daemons are aborted on account of a failure, they are automatically restarted on the node. When the cssd daemon is aborted on account of a failure on a node, the node is restarted. The RAC MNP stack continues to run with one less instance on the site. The Site Controller package continues to run uninterrupted as long as there is at least one RAC MNP instance running and the RAC MNP package has not failed. However, if the failed RAC database instance is the last surviving instance on the site, when the node is restarted, it initiates a failover of the Site Controller package to the remote site. The Site Controller package, during startup at the remote site, will detect the failure and perform a site failover starting up the RAC MNP stack configured in that site. Administering the Site Aware Disaster Tolerant Metrocluster Environment This section describes the procedures that you must perform to administer the SADTA environment. This section elaborates the procedures using the Oracle database 10gR2 RAC workload as an example. This section addresses the following topics: • Maintaining a Node • Online Addition and Deletion of Nodes • Maintaining the Site • Maintaining the Metrocluster Environment File • Maintaining Site Controller Package • Starting a Disaster Tolerant Oracle Database 10gR2 RAC • Shutting Down a Disaster Tolerant Oracle Database 10gR2 RAC 366 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture • • • • Halting and Restarting the RAC Database MNP Packages Maintaining Oracle Database 10gR2 RAC MNP packages on a Site Maintaining Oracle Database 10gR2 RAC Moving a Site Aware Disaster Tolerant Oracle RAC Database to a Remote Site Maintaining a Node To perform maintenance procedures on a cluster node, the node must be removed from the cluster. Run the cmhaltnode -f command to move the node out of the cluster. This command halts the RAC MNP stack instance running on the node. As long as there are other nodes in the site and the Site Controller package is still running on the site, the site aware disaster tolerant database continues to run with one less instance on the same site. However, if the node that needs to be halted in the cluster is the last surviving node in the site, then the Site Controller packages running on this node fail over to the other site. In such scenarios, the site aware disaster tolerant database must be moved to the remote site before halting the node in the cluster. For more information on moving a site aware disaster tolerant RAC database to a remote site, see “Moving a Site Aware Disaster Tolerant Oracle RAC Database to a Remote Site” (page 374). Online Addition and Deletion of Nodes Metrocluster requires equal number of nodes to be configured at the primary and remote data centers. Therefore, whenever a RAC database instance is added or deleted at primary site, you must add or delete the replica database instance at the remote site as well. Online node addition involves procedures on both the sites of the redundant RAC database configuration. 1. Online node addition on the primary site where the RAC Database package stack is running. 2. Online node addition on the remote site where the RAC Database package stack is down. Similarly, online node deletion also involves performing the following tasks. 1. Online node deletion on the primary site where the RAC Database package stack is running. 2. Online node deletion on the remote site with where the RAC Database package stack is down. Administering the Site Aware Disaster Tolerant Metrocluster Environment 367 NOTE: It is recommended to add or delete nodes online when the Site Controller package is halted in the detached halt mode. Adding Nodes Online on a Primary Site where the RAC Database is Running Complete the following procedure to add nodes online on a primary site where the RAC database package stack is running: 1. Install the required software on the new node and prepare the node for Oracle installation. 2. Halt the Site Controller package in the detached halt mode to avoid unnecessary site failover of the RAC database. 3. Ensure that the new node can access the CRS OCR and VOTE disks, and Oracle database disks and add the node to the Serviceguard cluster. 4. Extend the CRS software to the new node. For more information on extending the CRS software, see the Oracle Database 10gR2 RAC documentation available at the Oracle site. 5. Modify the configured CRS package using the HP SGeRAC toolkit to include the details of the new node and start CRS package on the new node. 6. Extend the Oracle database software on the new node. For more information on extending software on a new node, see the Oracle Database 10gR2 RAC documentation available at the Oracle site. 7. Prepare storage for RAC database on new node. For more information on preparing the storage device, see the Volume Manager Operations manual. 8. Add a database instance to the new node. For more information on adding a database instance, see the Oracle Database 10gR2 RAC documentation available at the Oracle site. 9. Modify the Oracle database package to add the details of the new node. 10. Start Oracle database package on the added node. 11. Re-configure the Site Controller package to include the details of the new node. Adding Nodes Online on a Remote Site where the RAC Database is Down Complete the following procedure to add a node online on a remote site where the RAC database package stack is down: 1. Prepare the storage for RAC database instance. a. For CFS or CVM, extend the RAC DG or the MP MNP packages to the node that must be added. b. For SLVM, import the RAC database volume groups on the node that must be added. 2. 368 Copy the RAC database instance pfile and password files from the added node at the other site to the node that is added at the current site. Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture 3. 4. 5. 6. Create the required Oracle database admin directories. Add a listener for the database on the node using the Oracle Network Configuration Assistant (NETCA). Create a tnsnames.ora entry for the new instance on all the nodes in the site. Register the new database instance with the CRS subcluster on remote site: # srvctl add instance -d hrdb -i hrdb3 -n SJC_3 7. 8. Modify the RAC package configuration on the remote site to add the details of the new node at the remote site. Start the Site Controller package. Deleting Nodes Online on the Primary Site where the RAC Database Package Stack is Running Complete the following procedure to delete nodes online on the primary site where the RAC database package stack is running: 1. Halt the Site Controller package in the detached halt mode. 2. Halt the RAC database package only on the node that must be deleted. Then remove the RAC database package configuration. For more information on halting the package and removing the RAC database, see the Serviceguard Extension for Oracle RAC toolkit documentation. 3. Delete an instance from the RAC database. For more information on deleting an instance, see the documentation available at the Oracle documentation site. 4. Delete the RAC database software and Oracle CRS. For more information on deleting the RAC database and Oracle CRS, see the documentation available at the Oracle documentation site. 5. Remove the node from the node list of the Site Controller package. 6. Run the cmhaltnode command to halt the cluster on this node. 7. Remove the node from the cluster configuration. For more information on removing a node from the cluster configuration, see the Managing Serviceguard manual available at http://docs.hp.com. Deleting Nodes Online on the Site where the RAC Database Package Stack is Down Complete the following procedure to delete nodes online on the site where the RAC database package stack is down: 1. Remove the RAC MNP package instance configuration on the node that must be deleted. 2. Remove access to RAC database storage from this node by removing the storage configuration. For more information on removing access to the RAC database storage device, see the CFS, CVM, and SLVM documentation. 3. Clear the registration of the RAC database instance with the CRS subcluster on the site using the following command: # srvctl remove instance -i hrdb3 –d hrdb Administering the Site Aware Disaster Tolerant Metrocluster Environment 369 4. 5. 6. 7. 8. 9. Remove the tnsnames.ora entry for the instance being deleted on all nodes in the site. Remove the RAC and CRS software on the node. For more information on removing Oracle RAC and CRS, see the Oracle Database 10gR2 RAC documentation. Remove the node from the node list of the Site Controller package. Halt the cluster on this node using the cmhaltnode command. Remove the node from the cluster configuration. For more information on removing nodes from the cluster configuration, see the Managing Serviceguard manual. Start the Site Controller package. Maintaining the Site Maintenance operation at a site might require that all the nodes on that site are down. In such scenarios, the site aware disaster tolerant database can be started on the other site to provide continuous service. For more information on moving a site aware disaster tolerant RAC database to a remote site, see “Moving a Site Aware Disaster Tolerant Oracle RAC Database to a Remote Site” (page 374). Maintaining the Metrocluster Environment File The Metrocluster environment file is available in the package directory of the Site Controller package. Follow all the rules and guidelines of Metrocluster while modifying the environment file. The changes you make to the environment file must be repeated on all the appropriate nodes. Maintaining Site Controller Package The Site Controller package is a Serviceguard failover package. The package attributes that can be modified online can be modified without halting the Site Controller package. Certain package attributes require that the Site Controller package is halted. Halting the Site Controller package halts the workload packages and closes the Site Safety Latch on the site. The Detached mode flag allows the Site Controller package to halt without halting the workload packages. Complete the following steps to halt the Site Controller package in the Detached mode: 1. Identify the node where the Site Controller package is running. cmviewcl –p 2. Login to the node where the Site Controller package is running and go to the Site Controller package directory. cd < dts_pkg_dir > 370 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture 3. Run the HP-UX touch command with the Detached flag, in the Site Controller package directory touch DETACHED 4. Halt the Site Controller package. cmhaltpkg The Site Controller package halts without halting the RAC MNP stack packages. The Site Controller package leaves the Site Safety Latch open on this site. The detach mode file is removed by the Site Controller package when it halts. After the maintenance procedures are complete, restart the Site Controller package on the same site where it was previously halted in the detached mode. You cannot start the Site Controller package on a different site node. Run the following command to start the Site Controller package: cmrunpkg Enable global switching for the Site Controller package. cmmodpkg –e When the Site Controller package is halted in detached mode, the active RAC MNP stack on the site can be halted and restarted at the same site as the Site Safety Latch is still open in the site. Starting a Disaster Tolerant Oracle Database 10gR2 RAC The disaster tolerant RAC database can be started in a Metrocluster by starting the Site Controller package of the corresponding database. Complete the following procedure to start the disaster tolerant database: 1. Ensure that the CRS MNP package on the site is up and running. cmviewcl –p 2. If you have CVM/CFS configured, ensure that the Serviceguard CFS SMNP package is also up and running in the Metrocluster. cmviewcl –p SG-CFS-pkg 3. Ensure that the Site Controller package is enabled on all nodes in the site where the database must be started. cmmodpkg –e –n -n 4. Start the Site Controller package by enabling it. cmmodpkg –e The Site Controller package starts on the preferred node at the site. At startup, the Site Controller package starts the corresponding RAC MNP stack packages in that site that Administering the Site Aware Disaster Tolerant Metrocluster Environment 371 are configured as managed packages. After the RAC MNP stack packages are up, you must check their package log files for any errors that could have occurred at startup. If the CRS MNP instance on a node is not up, the RAC MNP stack instance on that node does not start. However, if CVM/CFS is configured, the CVM DG and CFS MP MNP will start. Shutting Down a Disaster Tolerant Oracle Database 10gR2 RAC The disaster tolerant RAC database can be shutdown by halting the Site Controller package of the corresponding database. To shutdown the database, run the following command on any node in the cluster: cmhaltpkg This command halts the Site Controller package and the current active RAC MNP stack of the database. After shutting down, check the Site Controller package log file and the RAC MNP stack package log files to ensure that the database shut down appropriately. It is recommended that you manage the RAC database startup and shutdown using the package administration commands. Halting the RAC database using the Oracle interfaces, such as srvctl and sqlplus, will cause the RAC MNP package to fail. The Site Controller package interprets this as a failure and initiates a site failover, which is not necessary. Halting and Restarting the RAC Database MNP Packages The RAC MNP stack of the active database of a disaster tolerant RAC database can be shutdown without impacting the remaining disaster tolerant infrastructure. The Site Controller package can continue to run on the site where the active database is running. Run the following command to halt the RAC MNP package of the active database: cmhaltpkg To halt all RAC MNP stack packages, including the CFS MP and CVM DG MNP packages, specify all the package names with the cmhaltpkg command. This command halts all RAC MNP stack packages. The Site Controller package continues to run and does not initiate a site failover. You can restart the RAC MNP stack later on the same site using the cmrunpkg command as long as the Site Controller package is running on this site. However, special care must be taken when the Site Controller package is halted after halting the active RAC MNP stack on a site. The disaster tolerant RAC database cannot be started by starting the RAC MNP stack packages as the Site Safety Latch is closed on both sites when the Site Controller package is halted in the cluster. In this case, the disaster tolerant RAC database can be started only by restarting the Site Controller package in the cluster. To restart the database at the same site, the Site Controller package must be started on that site. 372 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Maintaining Oracle Database 10gR2 RAC MNP packages on a Site The RAC MNP package is a SGeRAC toolkit package. To complete maintenance procedures on the RAC MNP package, follow the procedures recommended by the SGeRAC toolkit for RAC MNP package maintenance. A maintenance operation on the RAC MNP package at a site can also involve halting the RAC MNP package. Run the following command to halt the RAC MNP. cmhaltpkg Because the RAC MNP is down, as it is halted in the cluster, the Site Controller package does not interpret it as a failure. The Site Controller package continues to run on the same site and the Site Safety Latch will remain open. After the maintenance procedures are complete, restart the RAC MNP package at the same site. Run the following command to restart the RAC MNP package: cmrunpkg However, if the Site Controller package is halted while the RAC MNP package is halted for maintenance, it results in the RAC MNP stack on that site shutting down. In this scenario, the RAC MNP package can only be started by restarting the Site Controller package. This is because the Site Safety Latch closes when the Site Controller package halts. Maintaining Oracle Database 10gR2 RAC A RAC database configured using SADTA has two replicas of the RAC database configuration; one at each site. The database configuration is replicated between the replicas using a replicated storage. Most of the maintenance changes done at the site with the active database configuration is propagated to the other site. However, some changes, such as service configuration, and Oracle patch installation, are not replicated. Such maintenance operations must be performed on the RAC database configurations at both the sites. For more information on maintaining an Oracle RAC database, see the Oracle documentation. You must also see the Serviceguard Extension for Oracle RAC manual available at http://docs.hp.com. To complete maintenance operations that require that the RAC database or its instance to be halted, requires the SGeRAC toolkit MNP package in the cluster or its instance to be halted. Run the following command to halt the SGeRAC MNP package: cmhaltpkg The Site Controller package does not interpret halting of the RAC MNP Stack package as a failure and continues to run uninterrupted. Once the maintenance procedures are complete, restart the RAC MNP package. Run the following command to restart the RAC MNP package: cmrunpkg Administering the Site Aware Disaster Tolerant Metrocluster Environment 373 However, if the Site Controller package is halted while the RAC MNP package was halted for maintenance, the RAC MNP stack can only be started by restarting the Site Controller package. Moving a Site Aware Disaster Tolerant Oracle RAC Database to a Remote Site To perform maintenance operations that require the entire site to be down, you can move the disaster tolerant Oracle RAC database to a remote site. To move the RAC database to a remote site, the local RAC database configuration must be first shut down and then the remote RAC database configuration must be started. Complete the following procedure to move the database to a remote site: 1. Halt the Site Controller package of the RAC database. cmhaltpkg 2. Ensure the RAC MNP stack packages are halted successfully. cmviewcl -l package 3. Start the Site Controller package on a node in the remote site. cmrunpkg The Site Controller package starts up on a node in the remote site and starts the RAC MNP stack packages that are configured. Limitations of a Site Aware Disaster Tolerant Architecture Following are the limitations of SADTA: • No Support for Arbitrator Nodes When configuring Metrocluster for applications using CFS sub-clustering storage, no arbitrator nodes must be configured. The Metrocluster, in such configurations, must only use the Quorum Server at a third location to handle network partitions. Additionally, in a cross-subnet environment, appropriate network routes to the Quorum Server at the third location, must be configured on both the site nodes. • Site Failover Behavior based on NODE_NAME Order When the Site Controller package fails over to the remote site while the workload package is operational at the local site, it does not immediately start up at the remote site. The local site workload package continues to run, and the Site Controller package must be manually restarted at the local site. To minimize the need for such intervention, the nodes at the preferred site must appear first in the node list of the Site Controller packages, followed by the nodes at the adoptive site. A similar scenario can occur after a remote failover, when the nodes of the local site are re-enabled to run the Site Controller package, and the node running the Site Controller package fails. Serviceguard will fail the Site Controller package to 374 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture the local site, where it will fail to start up. The Site Controller package at the remote site must be manually restarted. To minimize the need for such intervention, rearrange the node list to put the adoptive nodes first, before re-enabling the local site to run the Site Controller package. • VERITAS Cluster Volume Manager standalone software is not supported in SADTA with Metrocluster. Troubleshooting This section describes the procedures to troubleshoot errors and error messages that occur in a SADTA environment. This section addresses the following topics: • Logs and Files • Cleaning the Site to Restart the Site Controller Package • Identifying and Cleaning RAC MNP Stack Packages that are Halted • Understanding Site Controller Package Logs Logs and Files This section describes the guidelines to be considered before working with files that need to be configured for SADTA. Following are some of the guidelines to consider: • The Site Controller package control log file can be specified using the attribute script_log_file in the Site Controller package configuration file. Serviceguard defaults the Site Controller package logs to the default log destination. The default log destination for a given Site Controller package is /var/adm/cmcluster/ log .log. • It is recommended that you specify a file path under the directory specified for the dts_pkg_dir attribute in the Site Controller package configuration file. • The Metrocluster storage preparation modules will also log to the Site Controller package log file. • The RAC MNP package at each site will log in a file named .log under its package directory on the site nodes. • The CFS MP MNP packages will log to a file named /etc/cmcluster/cfs/ .log on their corresponding CFS sub-cluster nodes. • The CVM DG MNP packages will log to a file named /etc/cmcluster/cfs/ .log on their corresponding CFS sub-cluster nodes. • The Site Safety Latch mechanism logs are saved in the/etc/opt/resmon/log/ api.log directory. Troubleshooting 375 Cleaning the Site to Restart the Site Controller Package The Site Controller package startup on a site can fail for various reasons. The Site Safety Latch will be in a special state: INTERMEDIATE. In this state, the Site Controller package cannot be started from either of the sites. A special tool called cmresetsc is provided to cleanup a site under this condition. This tool will cleanup the Metrocluster environment in site for the Site Controller package, it does not provide cleanup for the Site Controller managed packages. Any issues in the packages managed by Site Controller package must be fixed by the operator and the package’s node switching must be enabled before restarting the Site Controller package. When the Site Controller package has failed, the following message will appear in the Site Controller package log file at the node where it was last running or started. Check the package log files on all nodes of all other packages managed by the Site Controller package to identify issues in those packages. Perform the following steps to clean a site for a site aware disaster tolerant application: 1. Clean the Site Safety Latch on the site by running the cmresetsc tool. You must be root user to use this command. On a node from the site, run the following command: /usr/sbin/cmresetsc 2. 3. Check the package log file of the Site Controller package on the node it failed and fix any reported issues. Enable node switching for the Site Controller package on that node. cmmodpkg –e -n 4. 5. Check the package log file on all nodes of the MNP packages managed by Site Controller package on the site. Fix any issues reported in the package log files and enable node switching for the MNP packages on the failed nodes. cmmodpkg -e -n -n 6. Restart the Site Controller package and enable global switching. cmrunpkg cmmodpkg -e In addition to the cmresetsc tool, use the cmviewsc tool, a contrib tool, to view information on the Site Controller package managed workload package. This tool is available in the /opt/drenabler/utils/ directory. Run the following command to use the cmviewsc tool. cmviewsc [-v] [Site_Controller_package_name] 376 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture This tools displays the following information: • The number of critical and managed packages at each site. • Status of the Site Controller managed packages (halted or started) • Site Controller managed packages halted cleanly or not • Site active or passive • Site Safety latch value on each node Identifying and Cleaning RAC MNP Stack Packages that are Halted The Site Controller package does not start if the RAC MNP stack packages are not halted clean. An MNP package is halted unclean when the halt script does not run successfully on all the configured nodes of the package. This implies that there might be some stray resources, configured with the package, that are online in the cluster. The Site Controller package logs the following message in its log file on the node where it failed to start: Package has not halted cleanly on node The following command shows whether an MNP package halt was clean or unclean: cmviewcl –v –f line Check for the field last_halt_failed under each instance of the MNP package. When set to Yes, that instance of the MNP package did not successfully execute the halt script when it was halted. Check for all instances. The unclean nodes might have stray resources. See the MNP package log file on the corresponding node to identify the reason for the halt script run failure. Clean any stray resources that are still online in the node and enable node switching on the node for the package. This clears the flag and allows the Site Controller package to start. Complete this procedure for all nodes where the MNP package instance has halted unclean. cmmodpkg –e -n Understanding Site Controller Package Logs This section describes the various messages that are logged in the log files and the methods to resolve those error messages. HP Serviceguard enables the SADTA feature only when the Metrocluster software, with the Site Controller package functionality, and SGeRAC is installed on all nodes in the cluster. If the necessary software products are not installed, the cmapplyconf and cmcheckconf commands will fail and one of the following error messages is displayed: SGeRAC sub-clustering functionality is not installed. Metrocluster Site Controller feature is not installed. Table 7–4 describes the error messages that are displayed and the recommended resolution. Troubleshooting 377 Table 7-4 Error Messages and their Resolution Log Messages Cause Resolution Starting Site Controller (hrdb_sc) on site siteB. The critical package at siteA is still running. 1. Clean the nodes on siteA and enable node switching for the Site Controller package. 2. Restart the Site Controller package on siteA. An MNP package managed by Site Controller at siteA Site safety latch at site siteA is open. has not halted clean Checking if site failover conditions are in at least one of its met. nodes. Package hrdb has not halted cleanly on node . 1. Check the MNP package Log on all the nodes it failed to run the halt script successfully. 2. Clean any stray resources that are still online on the node. 3. Re-enable node switching for the MNP package on the node. 4. Restart the Site Controller package. Site safety latch at site siteA is open. Checking if site failover conditions are met. Critical package hrdb at site siteA is up. Error: Site failover conditions are not met. Unexpected Site Controller startup. Unable to initiate site failover at site SiteB. Site Controller startup failed. Starting Site Controller (hrdb_sc) on site siteB. Unable to initiate site failover at site siteB. Site Controller startup failed. Starting Site Controller (hrdb_sc) on site siteB. Site safety latch at site siteA is open. Checking if site failover conditions are met. The Site Controller was halted in the Detached mode at site siteA. Start the Site Controller package on the site where it was previously halted in the detached mode. Error: Site failover conditions are not met. Unexpected Site Controller startup. Unable to initiate site failover at site siteB. Site Controller startup failed 378 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Table 7-4 Error Messages and their Resolution (continued) Log Messages Cause Unable to execute command. A package Dependency on the following packages dependency condition is not met not met: crspkg. on this site. cmrunpkg: Unable to start some package or package instances. Check the log files of the packages managed by Site Controller for more details. Refer to Metrocluster documentation for cleanup procedures needed before restarting the Site Controller. Resolution 1. Check the log file of the failed package. 2. Identify and fix the problem. It is possible that the CRS MNP packages are not running. 3. Enable node switching for the failed MNP package. 4. Clean the site using the cmresetsc tool. 5. Restart the Site Controller package. Site Controller startup failed. Starting Site Controller (hrdb_sc) on site siteA. The Site Controller has failed to startup on this site. 1. Check the log files of all the packages managed by Site Controller package on the site. 2. Identify the issues and fix them. 3. Enable node switching for the package managed by Site Controller package on the site. 4. Clean the site using the cmresetsc tool. 5. Restart the Site Controller package. The Site Controller has failed to startup Site Controller start up on the site siteA on this site. has failed. 1. Check the log files of all the packages managed by Site Controller package on the other site. 2. Identify and issues and fix them. 3. Enable node switching for the package managed by Site Controller package on the other site. 4. Clean the other site using the cmresetsc tool on a node in the other site. 5. Restart the Site Controller package. Previous start up of Site Controller (hrdb_sc) on the site siteA has failed. Clean up the site siteA and start hrdb_sc again. Refer to Metrocluster documentation for cleanup procedures needed before restarting the Site Controller. Site Controller startup failed. Starting Site Controller (hrdb_sc) on site siteB. Clean up the site siteA and start hrdb_sc again. Refer to Metrocluster documentation for cleanup procedures needed before restarting the Site Controller. Site Controller startup failed. Closing site safety latch at site siteA. The Site Controller 1. Check the package log file on the node shutdown detected it has failed to halt successfully. Checking all packages managed by Site that a package failed 2. Clean any stray resources owned by Controller at siteA: hrdb hrdb_dg to halt successfully. the package, that are still online on hrdb_mp flash-dg flash_mp have the node. halted. 3. Enable node switching for the package Package hrdb has not halted cleanly on the node. on node ccia7. Site Controller halt failed. Troubleshooting 379 Table 7-4 Error Messages and their Resolution (continued) Log Messages Cause Executing: cmrunpkg siteA_mg1 siteA_mg2 siteA_mg3. One of the package 1. Check the log file of the package on managed by Site the nodes where node switching is not Controller did not enabled. have node switching 2. Clean any stray resources owned by enabled on its the package, that are still online on configured nodes at the node. this site. 3. Enable node switching for the package on the nodes. 4. Clean the site using the cmresetsc tool. 5. Start the Site Controller package. Unable to run package siteA_mg1 on node ccia6, the node switching is disabled. Unable to run package siteA_mg1 on node ccia7, the node switching is disabled. cmrunpkg: Unable to start some package or package instances. Check the log files of the packages managed by Site Controller for more details. Resolution Refer to Metrocluster documentation for cleanup procedures needed before restarting the Site Controller. Site Controller startup failed. Failed to get local site name. There is no site defined for this node in the cluster configuration. Exiting (when sites are not defined) Failed to prepare the storage for site. The Site definitions in 1. Check the Serviceguard cluster the Serviceguard configuration file and reapply with cluster is no longer the sites defined appropriately. available. 2. Restart the Site Controller package. The preparation of 1. Check the host connectivity to disk the replicated disk arrays. and making it 2. Ensure that the replication read-write on the site management software is configured nodes failed. properly and is functioning correctly. 3. Restart the Site Controller package. Error: Metrocluster Environment file There is no 1. Restore the Metrocluster Environment does not exist in /etc/cmcluster/hrdb_sc Metrocluster file under the Site Controller package environment file in directory. the Site Controller 2. Check the Metrocluster Environment package directory on file is named using the Metrocluster the node where Site defined naming convention. Controller failed. 3. Restart the Site Controller package. Error: More than one environment files There is more than 1. Remove the redundant file. are found in /etc/cmcluster/hrdb_sc. one Metrocluster 2. Restart the Site Controller package. Environment file in the Site Controller package directory on the node where Site Controller failed. 380 Designing a Disaster Tolerant Solution Using Site Aware Disaster Tolerant Architecture Table 7-4 Error Messages and their Resolution (continued) Log Messages Cause Resolution Error: Failed to import CVM DG The Site Controller 1. Check the syslog on the site CVM failed to import the Master Node. CVM Disk Group on 2. Check the disk group status using the site it failed to VxVM commands. start. 3. Identify and fix any issues. 4. Restart the Site Controller package. Error: Failed to deport CVM DG The Site Controller 1. Check the syslog on the site CVM failed to deport the Master Node. CVM Disk Group on 2. Check the disk group status using the other site. VxVM commands. 3. Identify and fix any issues. 4. Restart the Site Controller package. Troubleshooting 381 382 A Environment File Variables for Serviceguard Integration with Continuous Access XP This appendix lists all Environment File variables that have been modified or added for disaster tolerant Serviceguard solutions that employ Continuous Access XP. It is recommended that you use the default settings for most of these variables, so exercise caution when modifying them: (Default=1) AUTO_FENCEDATA_SPLIT This parameter applies only when the fence level is set to DATA, which will cause the application to fail if the Continuous Access link fails or if the remote site fails. Values: 0—Do NOT startup the package at the primary site. Require user intervention to either fix the hardware problem or to force the package to start on this node by creating the FORCEFLAG file. Use this value to ensure that the SVOL data is always current with the trade-off of long application downtime while the Continuous Access link and/or the remote site are being repaired. 1—(DEFAULT) Startup the package at the primary site. Request the local disk array to automatically split itself from the remote array. This will ensure that the application will be able to startup at the primary site without having to fix the hardware problems immediately. Note that the new data written on the PVOL will not be remotely protected and the data on SVOL will be non-current. When the Continuous Access link and/or the remote site is repaired, you must manually use the command “pairresync” to re-join the PVOL and SVOL. Until that command successfully completes, the PVOL will NOT be remotely protected and the SVOL data will not be current. Use this value to minimize the down time of the application with the trade-off of having to manually resynchronize the pairs while the application is running at the primary site. If the package has been configured for a three data center environment, this parameter is applicable only when the package is attempting 383 to start up in either the primary (DC1) or secondary (DC2) data center. This parameter is not relevant in the recovery cluster or the third data center. Use this parameter’s default value in the third data center. AUTO_NONCURDATA 384 (Default=0) This parameter applies when the package is starting up with possible non-current data under certain Continuous Access pair states. During failover, this parameter will apply when the SVOL is in the PAIR or PFUL state and the PVOL side is in the PSUE, EX_ENORMT, EX_CMDIOE or PAIR (for Continuous Access Journal) state. During failback, this parameter will apply when the PVOL is in the PSUS state and the SVOL is in the EX_ENORMT or EX_CMDIOE state. When starting the package in any of the above states, you run the risk of losing data. Values: 0—(DEFAULT) Do NOT startup the application on non-current data. If Metrocluster/Continuous Access cannot determine the data is current, it will not allow the package to start up. (Note: for fence level DATA and NEVER, the data is current when both PVOL and SVOL are in PAIR state.) 1—Startup the application even when the data may not be current. Environment File Variables for Serviceguard Integration with Continuous Access XP NOTE: When a device group state is SVOL_PAIR on the local site and EX_ENORMT (Raid Manager or node failure) or EX_CMDIOE (disk I/O failure) on the remote site (this means it is impossible for Metrocluster/Continuous Access to determine if the data on the SVOL site is current), Metrocluster/Continuous Access conservatively assumes that the data on the SVOL site may by non-current and uses the value of AUTO_NONCURDATA to determine whether the package is allowed to automatically start up. If the value is 1, Metrocluster/Continuous Access allows the package to startup; otherwise, the package will not be started. NOTE: In a three data center environment, if the package is trying to start up in data center three (DC3), within the recovery cluster, only AUTO_NONCURDATA may be checked. All other AUTO parameters are not relevant when a package tries to start up on DC3. Use the two scenarios below to help you determine the correct environment settings for AUTO_NONCURDATA and AUTO_FENCEDATA_SPLIT for your Metrocluster/Continuous Access packages. Scenario 1: With the package device group fence level DATA, if setting AUTO_FENCEDATA_SPLIT=0, it is guaranteed that the remote data site will never contain non-current data (this assumes that the FORCEFLAG has not been used to allow the package to start up if the Continuous Access links or SVOL site are down). In this environment, you can set AUTO_NONCURDATA=1 to make the package automatically startup on the SVOL site when the PVOL site fails, and it is guaranteed the package data is current. (If setting AUTO_NONCURDATA=0, the package will not automatically startup on the SVOL site.) Scenario 2:When the package device group fence level is set to NEVER or ASYNC, you are not guaranteed that the remote (SVOL) data site still contains current data (The application can 385 continue to write data to the device group on the PVOL site if the Continuous Access links or SVOL site are down, and it is impossible for Metrocluster/Continuous Access to determine whether the data on the SVOL site is current.) In this environment, it is required to set AUTO_NONCURDATA=0 if the intention is to ensure the package application is running on current data. (If setting AUTO_NONCURDATA=1, the package will be started up on SVOL site whether the data is current or not.) AUTO_PSUEPSUS 386 (Default=0) In asynchronous mode, when the primary site fails, either due to Continuous Access link failure, or some other hardware failure, and we fail over to the secondary site, the PVOL will become PSUE and the SVOL will become PSUS(SSWS). During this transition, horctakeover will attempt to flush any data in the side file on the MCU to the RCU. Data that does not make it to the RCU will be stored on the bit map of the MCU. When failing back to the primary site any data that was in the MCU side file that is now stored on the bit map will be lost during resynchronization. In synchronous mode with fence level NEVER, when the Continuous Access link fails, the application continues running and writing data to the PVOL. At this point the SVOL contains non-current data. If there is another failure that causes the package to fail over and start on the secondary site, the PVOL will become PSUE and the SVOL will become PSUS(SSWS). When failing back to the primary site, any differential data that was on the PVOL prior to failover will be lost during resynchronization. Environment File Variables for Serviceguard Integration with Continuous Access XP NOTE: This variable is also used for the combination of PVOL_PFUS and SVOL_PSUS. When either the side file or journal volumes have reached threshold timeout, the PVOL will become PFUS. If there is a Continuous Access link, or some other hardware failure, and we fail over the secondary site, the SVOL will become PSUS(SSWS) but the PVOL will remain PFUS. Once the hardware failure has been fixed, any data that is on the MCU bit map will be lost during resynchronization. This variable will allow package startup if changed from default value of 0 to 1. If the package has been configured for a three data center (3DC) environment, this parameter is applicable only when the package is attempting to start up in either the primary (DC1) or secondary (DC2) data center. This parameter is not relevant in (the third data center) in the recovery cluster. Use this parameter’s default value in the third data center. Values: 0—(DEFAULT) Do NOT failback to the PVOL side after an outage to the PVOL side has been fixed. This will protect any data that may have been in the MCU side file or differential data in the PVOL when the outage occurred.1—Allow the package to startup on the PVOL side. We failed over to the secondary (SVOL) side due to an error state on the primary (PVOL) side. Now we're ready to failback to the primary side. The delta data between the MCU and RCU will be resynchronized. This resynchronization will over write any data that was in the MCU prior to the primary (PVOL) side failure. AUTO_PSUSSSWS (Default=0) This parameter applies when the PVOL is in the suspended state PSUS, and SVOL is in the failover state PSUS(SSWS). When the PVOL and SVOL are in these states, it is hard to tell which side has the good latest data. When starting the package in this state on the PVOL side, you run the risk of losing any changed data in the PVOL. 387 Values: 0—(Default) Do NOT startup the package at the primary site. Require user intervention to choose which side has the good data and resynchronizing the PVOL and SVOL or force the package to start by creating the FORCEFLAG file.1—Startup the package after resynchronize the data from the SVOL side to the PVOL side. The risk of using this option is that the SVOL data may not be a preferable one. If the package has been configured for a three data center (3DC) environment, this parameter is applicable only when the package is attempting to start up in either the primary (DC1) or secondary (DC2) data center. This parameter is not relevant in (the third data center) in the recovery cluster. Use this parameter’s default value in the third data center. AUTO_SVOLPFUS 388 (Default=0) This parameter applies when the PVOL and SVOL both have the state of suspended (PFUS) due to the side file reaching threshold while in Asynchronous mode only. When the PVOL and SVOL are in this state, the Continuous Access link is suspended, the data on the PVOL is not remotely protected, and the data on the SVOL will not be current. When starting the package in this state, you run the risk of losing any data that has been written to the PVOL side. Values: 0—(Default) Do NOT startup the package at the secondary site and allowing restart on another node. Require user intervention to either fix the problem by resynchronizing the PVOL and SVOL or force the package to start on this node by creating the FORCEFLAG. 1—Startup the package after making the SVOL writable. The risk of using this option is that the SVOL data may actually be non-current and the data written to the PVOL side after the hardware failure may be loss. This parameter is not required to be set if a package is configured for three data centers environment because three data center does not Environment File Variables for Serviceguard Integration with Continuous Access XP support Asynchronous mode of data replication. Leave this parameter with its default value in all data centers. AUTO_SVOLPSUE (Default=0) This parameter applies when the PVOL and SVOL both have the state of PSUE. This state combination will occur when there is an Continuous Access link, or other hardware failure, or when the SVOL side is in a PSUE state while we can not communicate with the PVOL side. This will only apply while in the Asynchronous mode. The SVOL side will become PSUE after the Continuous Access link timeout value has been exceeded at which time the PVOL side will try and flush any outstanding data to the SVOL side. If this flush is unsuccessful, then the data on the SVOL side will not be current. Values: 0—(Default) Do NOT startup the package at the secondary site and allow package to try another node. Require user intervention to either fix the problem by resynchronizing the PVOL and SVOL or force the package to start on this node by creating the FORCEFLAG file. 1—Startup the package on the SVOL side. The risk of using this option is that the SVOL data may actually be non-current and the data written to the PVOL side after the hardware failure may be loss. This parameter is not required to be set if a package is configured for three data centers environment because three data center does not support Asynchronous mode of data replication. Leave this parameter with its default value in all data centers. AUTO_SVOLPSUS (Default=0) This parameter applies when the PVOL and SVOL both have the state of suspended (PSUS). The problem with this situation is we cannot determine the previous state: COPY or PAIR. If the previous state was PAIR, it is completely safe to startup the package at the remote site. If the 389 previous state was COPY, the data at the SVOL site is likely to be inconsistent Values: 0—(Default) Do NOT startup the package at the secondary site. Require user intervention to either fix the problem by resynchronizing the PVOL and SVOL or force the package to start on this node by creating the FORCEFLAG file. 1—Startup the package after making the SVOL writable. The risk of using this option is the SVOL data may be inconsistent and the application may fail. However, there is also a chance that the data is actually consistent, and it is okay to startup the application. If the package has been configured for a three data center environment, this parameter is applicable only when the package is attempting to start up in either the primary (DC1) or secondary (DC2) data center. TThis parameter is not relevant in (the third data center) in the recovery cluster. Use this parameter’s default value in the third data center. 390 CLUSTER_TYPE This parameter defines the clustering environment in which the script is used. Should be set to “metro” if this is a Metrocluster environment and “continental” if this is a Continentalclusters environment. A type of “metro” is supported only when the HP Metrocluster product is installed. A type of “continental” is supported only when the HP Continentalclusters product is installed. If the package is configured for three data centers (3DC), the value of this parameter must be set to “metro” for DC1 and DC2 nodes and “continental” for DC3 nodes. DEVICE_GROUP The Raid Manager device group for this package. This device group is defined in the /etc/ horcm<#>.conf file. This parameter is not required to be set for a package configured for three data centers environment. Device groups for three data center's packages have new parameters. Environment File Variables for Serviceguard Integration with Continuous Access XP 3DC_TOPOLOGY This parameter is the differentiator between packages configured for two data centers (2DC) and three data centers (3DC). Unlike packages configured for two data centers (2DC), this parameter must be set for a package configured for a three data center environment. This parameter can has the following values: 1. multi-hop-bi-link: (DEFAULT) When the package is configured for 1:1:1, or cascaded, or multi hop topology with two Continuous Access links. The sync Continuous Access link, is between DC1 and DC2 and the other link, the journaling Continuous Access link, is between DC2 and DC3. There is no physical Continuous Access link present between DC1 and DC3 but a phantom device group must be present between DC1 and DC3. 2. multi-target-bi-link: When the package is configured for 1:2 or multi target topology with two Continuous Access links. One, the sync Continuous Access link, is between DC1 and DC2 and the other, the journal Continuous Access link, is between DC1 and DC3. There is no physical Continuous Access link present between DC2 and DC3 but a phantom device group must be present between DC2 and DC3. DC1_NODE_LIST Comma separated list of all node names that are in data center one (DC1). The node list should begin and end with quotation marks. For example, DC1_NODE_LIST=”node1, node2”. This parameter is not required to be set for a package configured for two data centers. DC2_NODE_LIST Comma separated list of all node names that are in data center two (DC2). The node list should begin and end with quotation marks. For example, DC2_NODE_LIST=”node3, node4”. This parameter is not required to be set for a package configured for two data centers. DC3_NODE_LIST Comma separated list of all node names that are in data center three (DC3). The node list should 391 begin and end with quotation marks. For example, DC3_NODE_LIST=”node5, node6”. This parameter is not required to be set for a package configured for two data centers. 392 DC1_DC2_DEVICE_GROUP The Raid Manager device group between DC1 and DC2 for this package. This device group is defined in the /etc/horcm<#>.conf file. This parameter is not required to be set for a package configured for two data centers. DC2_DC3_DEVICE_GROUP The Raid Manager device group between DC2 and DC3 for this package. This device group is defined in the /etc/horcm<#>.conf file. This will be a phantom device group if 3DC_TOPOLOGY=”multi-target-bi-link”. This parameter is not required to be set for a package configured for two data centers. DC1_DC3_DEVICE_GROUP The Raid Manager device group between DC1 and DC3 for this package. This device group is defined in the /etc/horcm<#>.conf file. This will be a phantom device group if 3DC_TOPOLOGY=”multi-hop-bi-link”. This parameter is not required to set for a package configured for two data centers. ***************************************************Device Group Monitor Specific Variables Device Group Monitor is currently available only with packages configured for two data centers (2DC). If a monitor variable is not defined (commented out) the default value is used. *************************************************** FENCE Fence level. Possible values are NEVER, DATA, and ASYNC. Use ASYNC for improved performance over long distances. If a Raid Manager device group contains multiple items where either the PVOL or SVOL devices reside on more than a single XP Series array, then the Fence level must be set to “data” in order to prevent the possibility of inconsistent data on the remote side if an Continuous Access link or an array goes down. The side effect of the “data” fence level is that if the package is running and a link goes down, an array goes down, or the Environment File Variables for Serviceguard Integration with Continuous Access XP remote data center goes down, then write(1) calls in the package application will fail, causing the package to fail. NOTE: The Continuous Access Journal is used for asynchronous data replication. Fence level ascyn is used for a journal group pair. If the package is configured for three data centers (3DC), this parameter holds the fence level of device group between DC1 and DC2. As the device group between DC1 and DC2 is always synchronous, the fence level will be either “data” or “never”. The fence level of device group between DC2 and DC3 or DC1 and DC3 is always assumed to be “async” and user need not mention it. HORCMINST HORCMPERM HORCTIMEOUT This is the instance of the Raid Manager that the control script will communicate with. This instance of Raid Manager must be started on all nodes before this package can be successfully started. (Note: If this variable is not exported, Raid Manager commands used in this script may fail). This variable supports the security feature, RAID Manager Protection Facility on the Continuous Access devices. (Note: If the RAID Manager Protection Facility is disabled, set this variable to MGRNOINST. This is the default value). (Default=360) This variable is used only in asynchronous mode when the horctakeover command is issued; it is ignored in synchronous mode. The value is used as the timeout value in the horctakeover command, -t . The value is the time to wait while horctakeover re-synchronizes the delta data from the PVOL to the SVOL. It is used for swap-takeover and SVOL takeover. If the timeout value is reached and a timeout occurs, horctakeover returns the value EX_EWSTOT. The unit is seconds. In asynchronous mode, when there is an Continuous Access link failure, both the PVOL and SVOL sides change to a PSUE state. However, this change will not take place until 393 the Continuous Access link timeout value, configured in the Service Processor (SVP), has been reached. If the horctakeover command is issued during this timeout period, the horctakeover command could fail if its timeout value is less than that of the Continuous Access link timeout. Therefore, it is important to set the HORCTIMEOUT variable to a value greater than the Continuous Access link timeout value. The default Continuous Access link timeout value is 5 minutes (300 seconds). A suggested value for HORCTIMEOUT is 360 seconds. During package startup, the default startup timeout value of the package is set to NO_TIMEOUT in the package ASCII file. However, if there is a need to set a startup timeout value, then the package startup timeout value must be greater than the HORCTIMEOUT value, which is greater than the Continuous Access link timeout value: Pkg Startup Timeout > HORCTIMEOUT > Continuous Access link timeout value For Continuous Access Journal mode package, journal volumes in PVOL may contain a significant amount of journal data to be transferred to SVOL. Also, the package startup time may increase significantly when the package fails over and waits for all of the journal data to be flushed. The HORCTIMEOUT should be set long enough to accommodate the outstanding data transfer from PVOL to SVOL. MULTIPLE_PVOL_OR_SVOL_FRAME_FOR_PKG (Default=0) This parameter must be set to 1 if a PVOL or an SVOL for this package resides on more than one XP frame. Currently, only a value of 0 is supported for this parameter. NOTE: Future releases may allow a value of 1. Values: 0—(Default) Single frame. 1—Multiple frames. If this parameter is set to 1, then the device group must be created with the 394 Environment File Variables for Serviceguard Integration with Continuous Access XP “data” fence level, and the FENCE parameter must be set to “data” in this script. Contains the full path name of the package directory. This directory must be unique for each package to prevent the status file from being overwritten when there are multiple packages. The operator may create the FORCEFLAG file in this directory. Seconds to wait for each “pairevtwait” interval. WAITTIME (Note: do not set this to less then 300 seconds because the disks have some long final processing when the copy state reaches 100%). The following table describes AUTO_* variables settings and the package’s startup behavior that are supported with Metrocluster with Continuous Access XP version A.05.00 on HP-UX 11.0 and 11i or later systems. PKGDIR Table A-1 AUTO_FENCEDATA_SPLIT Local State Remote State Fence Level AUTO_FENCEDATA_SPLIT AUTO_FENCEDATA_SPLIT =1 or =0 (Default) FORCEFLAG=yes PVOL_PSUE SVOL_PAIR DATA PVOL_PSUE EX_ENORMT PVOL_PSUE EX_CMDIOE Do not start with exit Perform a PVOL takeover. After 1. the takeover succeeds, package starts with a warning message about the data is not remotely protected in the package’s control log file. 395 Table A-2 AUTO_NONCURDATA Local State Remote State Fence Level AUTO_NONCURDATA AUTO_NONCURDATA =1 or =0 (Default) FORCEFLAG=yes PVOL_PSUS EX_ENORMT PVOL_PSUS EX_CMDIOE NEVER/DATA/ASYNC Do not start with exit 1 PVOL_PFUS EX_ENORMT ASYNC PVOL_PFUS EX_CMDIOE SVOL_PAIRSVOL_PAIR PVOL_PSUE SVOL_ PAIR EX_ENORMT EX_CMDIOE NEVER/DATA/ASYNC Do not start with exit 1 SVOL_PFULSVOL_PFULSVOL_PFUL PVOL_PSUE EX_ENORMT EX_CMDIOE ASYNC SVOL-PAIR;MINAP=0 PVOL-PAIR; MINAP>0, MINAP=0 (Continuous Access-Journal) Do not start with exit 1 EX_ENORMTEX_CMDIOE Starts with a warning message about non-current data in the package’s control log file. Perform SVOL takeover, which changes SVOL to PSUS(SSWS). After the takeover succeeds, package starts with a warning message about non-current data in the package’s control log file Perform SVOL takeover, which changes SVOL to PSUS(SSWS). After the takeover succeeds, package starts with a warning message about non-current data in the package's log file Table A-3 AUTO_PSUEPSUS Local State Remote State Fence Level AUTO_PSUEPSUS =0 (Default) AUTO_PSUEPSUS =1 or FORCEFLAG=yes PVOL_PSUE SVOL_PSUS ASYNC Do not start with exit 1 PVOL_PFUS SVOL_PSUS Pairresync-swapp works, package starts up. * If pairresync-swapp works, package starts up. *If pairresync-swapp fails, package does not start with exit 1. 396 Environment File Variables for Serviceguard Integration with Continuous Access XP Table A-4 AUTO_PSUSSSSWS Local State Remote State Fence Level AUTO_PSUSSSSWS =0 (Default) PVOL_PSUS SVOL_PSUS(SSWS) NEVERD /ATAA /SYNC Do not start with exit 1 AUTO_PSUSSSSWS =1 or FORCEFLAG=yes Pairresync-swapp works, package starts up. * If pairresync-swapp works, package starts up. *If pairresync-swapp fails, package does not start with exit 1. Table A-5 AUTO_SVOLPFUS Local State Remote State Fence Level AUTO_SVOLPFUS =0 (Default) AUTO_SVOLPFUS =1 or FORCEFLAG=yes SVOL_PFUS PVOL_PFUS ASYNC SVOL_PFUS EX_ENORMT SVOL_PFUS EX_CMDIOE Perform SVOL takeover, which changes SVOL to PSUS(SSWS). After the takeover succeeds, package starts with a warning message about non-current data in the package’s control log file Do not start on the local node with exit 2. Table A-6 AUTO_SVOLPSUE Local State Remote State Fence Level AUTO_SVOLPSUE AUTO_SVOLPSUE =0 (Default) =1 or FORCEFLAG=yes SVOL_PSUS PVOL_PSUS NEVER/DATA/ASYNC SVOL_PSUS EX_ENORMT SVOL_PSUS EX_CMDIOE Do not start with Perform a SVOL exit 2. to PSUS(SSWS). After the takeover succeeds, package starts with a warning message about non-current data in the package’s control log file. 397 Table A-7 AUTO_SVOLPSUS Local State Remote State Fence Level AUTO_SVOLPSUS =0 (Default) SVOL_PSUS PVOL_PSUS NEVERD /ATAA /SYNC Do not start with exit 2. SVOL_PSUS EX_ENORMT SVOL_PSUS EX_CMDIOE AUTO_SVOLPSUS =1 or FORCEFLAG=yes Perform a SVOL to PSUS(SSWS). After the takeover succeeds, package starts with a warning message about non-current data in the package’s control log file. NOTE: Environment File Variables for Device Group Monitor: Device Group Monitor is supported only with packages confirgured for two data centers (2DC). The following list the monitor specific variables that have been modified or added for Metrocluster with Continuous Access XP. If a monitor variable is not defined (commented out), the default value is used: (Default=10 minutes) MON_POLL_INTERVAL This parameter defines the polling interval for the monitor service (if configured). If the parameter is not defined (commented out), the default value is 10 minutes. Otherwise, the value will be set to the desired polling interval in minutes. 398 MON_NOTIFICATION_FREQUENCY (Default=0) This parameter controls the frequency of notification messages sent when the state of the device group remains the same. If the value is set to 0, then the monitor will only send notifications when the device group state changes. If the value is set to n where n is greater than 0, the monitor will send a notification every nth polling interval or when the device group state has changed. If the parameter is not defined (commented out), the default value is 0. MON_NOTIFICATION_EMAIL (Default=empty string) This parameter defines the email addresses that the monitor will use to send email notifications. The variable must use fully qualified email addresses. If multiple email addresses are defined, the comma must be used as a separator. If the parameter is not defined (commented out) or the default value is an empty string, this will Environment File Variables for Serviceguard Integration with Continuous Access XP indicate to the monitor that no email notifications will be sent. MON_NOTIFICATION_SYSLOG (Default=0) his parameter defines whether the monitor will send notifications to the syslog file. When the parameter is set to 0, the monitor will NOT send notifications to the syslog file. When the parameter is set to 1, the monitor will send notifications to the syslog file. If the parameter is not defined (commented out), the default value is 0. MON_NOTIFICATION_CONSOLE (Default=0) This parameter defines whether the monitor will send console notifications. When the parameter is set to 0, the monitor will NOT send console notifications. When the parameter is set to 1, the monitor will send console notifications. If the parameter is not defined (commented out), the default value is 0. AUTO_RESYNC This parameter defines the pre-defined resynchronization actions that the monitor can perform when the package is on the PVOL side and the monitor detects the Continuous Access data replication link is down. If the variable is not defined or commented, the default value of 0 is used. Values: 0— (Default) When the parameter is set to 0, the monitor will not perform any resynchronization actions. 1—When the parameter is set to 1 and the data replication link is down, the monitor will split the remote BC (if configured) and try to resynchronize the device. Until the resynchronization starts, the monitor will try to resynchronize every polling interval. Once the device group has been completely resynchronized, the monitor will resynchronize the remote BC. 2—When the parameter is set to 2 and the data replication link is down, the monitor will only try to perform resynchronization if a file named MON_RESYNC exists in the package directory 399 (PKGDIR). The monitor will not perform any operations to the remote BC (that is, split and resynchronize the remote BC). Therefore, this setting is used when you want to manage the remote BC 400 Environment File Variables for Serviceguard Integration with Continuous Access XP B Environment File Variables for Metrocluster Continuous Access EVA This appendix lists all Environment File variables that have been modified or added for Metrocluster with Continuous Access EVA. It is recommended that you use the default settings for most of these variables, so exercise caution when modifying them: This parameter defines the clustering CLUSTER_TYPE environment in which the script is used. Should be set to “metro” if this is a Metrocluster environment and “continental” if this is a Continentalclusters environment. A type of “metro” is supported only when the HP Metrocluster product is installed. A type of “continental” is supported only when the HP Continentalclusters product is installed. Contains the full path name of the package PKGDIR directory. This directory must be unique for each package to prevent the log file from being overwritten when there are multiple packages. This directory is also the place where the operator creates the FORCEFLAG file to force a package to start under certain circumstances. The file is deleted after the cmrunpkg packagename is running, so it must be re-created each time if you wish to start packages under these circumstances. DT_APPLICATION_STARTUP_POLICY This parameter defines the preferred policy in allowing the application to start with respect to the state of the data in the local volumes. It should be set to one of the following two policies: Availability_Preferred: The user chooses this policy if he prefers application availability. Metrocluster software will allow the application to start as long as the data is consistent; even though, it may not be current. Data_Currency_Preferred: This policy is chosen if it is preferred the application operates on consistent and current data. Metrocluster software will only allow the application to start if it can tell for sure that the data the application will operate on is current. This policy only focuses on the state of the local data (with respect to the application) being consistent and current. 401 WAIT_TIME (0 or greater than 0 (in minutes)) This parameter defines the timeout, in minutes, to wait for completion of the data merging or copying for the DR group before startup of the package on destination volume. If WAIT_TIME is greater than zero, and if the state of DR group is “merging in progress” or “copying in progress” state, Metrocluster software will wait up till WAIT_TIME value for the merging or copying completion. If WAIT_TIME expires and merging or copying is still in progress, the package will fail to start with an error not to starting on any node in the cluster. If WAIT_TIME is 0 (default value), and if the state of DR group is “merging in progress” or “copying in progress” state, Metrocluster software will not wait and will return an exit 1 code to Serviceguard package manager. The package will fail to start with an error not to starting on any node in the cluster. The name of the DR group used by this package. The DR group name is defined when the DR group is created. DC1_STORAGE_WORLD_WIDE_NAME The world wide name of the EVA storage system that resides in Data Center 1. This storage system name is defined when the storage is initialized. A list of the management servers that reside in DC1_SMIS_LIST Data Center 1. Multiple names can be defined by using commas as separators. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. DR_GROUP_NAME A list of the clustered nodes that reside in Data Center 1. Multiple names can be defined by using commas as separators. DC2_STORAGE_WORLD_WIDE_NAME The world wide name of the EVA storage system that resides in Data Center 2. This storage system name is defined when the storage is initialized. DC1_HOST_LIST 402 Environment File Variables for Metrocluster Continuous Access EVA DC2_SMIS_LIST A list of the management servers that reside in Data Center 2. Multiple names can be defined by using commas as separators. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in the order that they are specified. DC2_HOST_LIST A list of the clustered nodes that reside in Data Center 2. Multiple names can be defined by using commas as separators. Sets the time in seconds to wait for a response from the SMI-S CIMOM in storage management appliance. The minimum recommended value is 20 seconds. If the value is set to be smaller than 20 seconds, Metrocluster software may time out before getting the response from SMI-S, and the package will fail to start with an error, not starting on any node in the cluster. QUERY_TIME_OUT (Default 120 seconds) 403 404 C Environment File Variables for Metrocluster with EMC SRDF This appendix lists all Serviceguard control script variables that have been modified or added for Metrocluster with EMC SRDF. It is recommended that you use the default settings for most of these variables, so exercise caution when modifying them: Default: 0 AUTOR1RWSUSP This variable is used to indicate whether a package should be automatically started when it fails over from an R1 host to another R1 host and the device group is in suspended state. If it sets to 0, the package will halt unless ${PKGDIR}/ FORCEFLAG file has been created. The package halts because it is not known what has caused this condition. This could be caused by an operational error or a Symmetrix internal event, such as primary memory full. If in this situation you want to automatically start the package, AUTOR1RWSUSP should be set to 1. AUTOR1RWNL Default: 0 This variable indicates that when the package is being started on an R1 host, the Symmetrix is in a Read/Write state, and the SRDF links are down, the package will be automatically started. Although the script cannot check the state of the Symmetrix on the R2 side to validate conditions, the Symmetrix on the R1 side is in a ‘normal’ state. To require operator intervention before starting the package under these conditions, set AUTOR1RWNL=1 and create the file /etc/cmcluster/package_name/FORCEFLAG. AUTOR1UIP Default: 1 This variable indicates that when the package is being started on an R1 host and the Symmetrix is being synchronized from the Symmetrix on the R2 side, the package will halt unless the operator creates the $PKGDIR/ FORCEFLAG file. The package halts because performance degradation of the application will occur while the resynchronization is in progress. More importantly, it is better to wait for the resynchronization to finish to guarantee that the data are consistent even in the case of a rolling disaster where a second failure occurs before the first failure is recovered from. To always automatically start the package even when resynchronization is in progress, set AUTOR1UIP=0. Doing so will result in inconsistent data in case of a rolling disaster. 405 406 AUTOR2RWNL Default: 1 AUTOR2WDNL=1 indicates that when the package is being started on an R2 host, the Symmetrix is in a Write-disabled state, and the SRDF links are down, the package will not be started. Since we cannot check the state of the Symmetrix on the R1 side to validate conditions, the data on the R2 side may be non-current and thus a risk that data loss will occur when starting the package up on the R2 side. To have automatic package startup under these conditions, set AUTOR2WDNL=0 AUTOR2WDNL Default: 1 AUTOR2RWNL=1 indicates that when the package is being started on an R2 host, the Symmetrix is in a read/write state, and the SRDF links are down, the package will not be started. Since we cannot check the state of the Symmetrix on the R1 side to validate conditions, the data on the R2 side may be non-current and thus a risk that data loss will occur when starting the package up on the R2 side. To have automatic package startup under these conditions, set AUTOR2RWNL=0 AUTOR2XXNL Default: 0 A value of 0 for this variable indicates that when the package is being started on an R2 host and at least one (but not all) SRDF links are down, the package will be automatically started. This will normally be the case when the ‘Partitioned+Suspended’ RDF Pairstate exists. We cannot check the state of all Symmetrix volumes on the R1 side to validate conditions, but the Symmetrix on the R2 side should be in a ‘normal’ state. To require operator intervention before starting the package under these conditions, set AUTOR2XXNL=1. AUTOSWAPR2 Default: 0 A value of 0 for this variable indicates that when the package is failing over to Data Center 2, it will not perform R1/R2 swap. To perform an R1/R2 swap, set AUTOSWAPR2=1/AUTOSWAPR2=2. This allows an automatic R1/R2 swap to occur only when the SRDF link and the two Symmetrix are properly functioning. When AUTOSWAPR2 is set to 1, the package will attempt to failover the device group to Data Center 2, followed by R1/R2 swap. If either of these operations fails, the package will fail to start on Data Center 2. When AUTOSWAPR2is set to 2, the package will continue to start up even if R1/R2 swap fails, provided the failover succeeds. In this scenario, the data will not be Environment File Variables for Metrocluster with EMC SRDF protected remotely. AUTOSWAPR2 cannot be set to 1 or 2 if CONSISTENCYGROUPS is set to 1. Verify you have the minimum requirements for R1/R2 Swapping by referring to most up-to-date version of the Metrocluster release notes. AUTO_SPLITR1 Default: 0 This variable is used to indicate whether a package is allowed to start when it fails over from an R1 host to another R1 host when the device group is in the split state. A value of 0 for this variable indicates that the package startup attempt will fail. To allow startup of the package in this situation, the variable should be set to a value of 1. CLUSTER_TYPE This parameter defines the clustering environment in which the script is used. This is, set to “metro” if this is a Metrocluster environment and “continental” if this is a Continentalclusters environment. A type of “continental” is supported only when the HP Continentalclusters product is installed. Default: 0 This parameter tells Metrocluster whether or not consistency groups were used in configuring the R1 and R2 volumes on the Symmetrix frames. A value of 0 is the normal setting if you are not using consistency groups. A value of 1 indicates that you are using consistency groups. (Consistency groups are required for M by N configurations.) If CONSISTENCYGROUPS is set to 1, AUTOSWAPR2 cannot be set to 1. Ensure that you have the minimum requirements for Consistency Groups by referring to Metrocluster release notes. CONSISTENCYGROUPS DEVICE_GROUP PATH This variable contains the name of the Symmetrix device group for the package on that node, which contains the name of the consistency group in an M by N configuration. Has been modified to include the path name of the Symmetrix SymCLI commands. This should be set to the default location of /usr/symcli/bin unless you have changed the location. 407 408 PKGDIR In addition to indicating the location of the package control scripts, this variable also will be used for the following files: • symcfg.out – contains the results of symcfg list command, used for model and revision information. • symrdf.out – contains the results of symrdf query commands run from the package control script. • awk.out – contains the output from using awk to parse the symrdf.out file. • FORCEFLAG – forces a package to start automatically under certain circumstances if this file is present. The cmrunpkg packagename command deletes this file, so it must be recreated each time if you wish to start packages under these circumstances. RDF_MODE Default: This parameter defines the data replication modes for the device group. The supported mode are “sync” for synchronous and “async” for Asynchronous. If RDF_MODE is not defined, synchronous mode is assumed. RETRY Default: 5. This is the number of times a SymCLI command is repeated before returning an error. Use the default value for the first package, and slightly larger numbers for additional packages making sure that the total of RETRY*RETRYTIME is approximately 5 minutes. Larger values for RETRY may cause the start-up time for the package to increase when there are multiple packages starting concurrently in the cluster that access the Symmetrix arrays. RETRYTIME Default: 5. This is the is the number of seconds between retries. The default value of 5 seconds should be used for the first package. The values should be slightly different for other packages. RETRYTIME should increase by two seconds for each package. The product of RETRY * RETRYTIME should be approximately five minutes. These variables are used to decide how often and how many times to retry the Symmetrix status and state change commands.Larger values for RETRYTIME may cause the start-up time for the package to increase when there are multiple packages starting concurrently in the cluster that access the Symmetrix arrays. Environment File Variables for Metrocluster with EMC SRDF SYNCTIMEOUT Default: 0. This variable denotes the number of seconds to wait for resync to complete after failback of the Symmetrix device group. If you set the value to 0, then the package will start after failback without waiting for resynchronization to complete. If you set the value to 1, then the package waits till resynchronization is complete before starting up. If SYNCTIMEOUT is set to any value from 5 to 36000, then the package will wait the specified time for resynchronization to complete after failback. If resynchronization does not complete even after the specified time, then the package will fail to start up; if resynchronization completes before that, then package will start up immediately after resynchronization is complete. 409 410 D Configuration File Parameters for Continentalclusters This appendix lists all Continentalclusters configuration file variables. See Chapter 2: “Designing a Continental Cluster”, for suggestions on coding these parameters. This is a time interval, in minutes and/or seconds, CLUSTER_ALARM [Minutes] after which the notifications defined in the MINUTES [Seconds] SECONDS associated NOTIFICATION parameters are sent and failover to the Recovery Cluster using the cmrecovercl command is enabled. This number must be a positive integer. Minimum is 30 seconds, maximum is 3600 seconds or 60 minutes (one hour) This is a time interval, in minutes and/or seconds, CLUSTER_ALERT [Minutes] after which the notifications defined in the MINUTES [Seconds] SECONDS associated NOTIFICATION parameters are sent. Failover to the Recovery Cluster using the cmrecovercl command is not enabled at this time. This number must be a positive integer. Minimum is 30 seconds, maximum is 3600 seconds or 60 minutes (one hour) This is the domain of the nodes in the previously CLUSTER_DOMAIN domainname specified cluster. This domain is appended to the NODE_NAME to provide a full system address across the WAN. CLUSTER_EVENT Clustername/Status This is a cluster name associated with one of the following changes of status: • up - the cluster is up and running • unreachable - the cluster is unreachable • down - the cluster is down, but nodes are responding • error - an error is detected The maximum length is 47 characters. When the MONITORING_CLUSTER detects a change in status, one or more notifications are sent, as defined by the NOTIFICATION parameter, at time intervals defined by the CLUSTER_ALERT and CLUSTER_ALARM parameters. CLUSTER_NAME clustername The name of a member cluster within the continental cluster. It should be the same name that is defined in the Serviceguard cluster 411 configuration ASCII file. Maximum size is 31 bytes. All nodes in the cluster should be listed after this variable using the NODE_NAME variable. A MONITOR_PACKAGE_NAME and MONITOR_INTERVAL should also be associated with each CLUSTER_NAME. CONTINENTAL_CLUSTER_NAME DATA_RECEIVER_PACKAGE DATA_SENDER_PACKAGE 412 name The name of a continental cluster managed by the Continentalclusters product. Maximum size is 31 bytes. This name cannot be changed after the configuration is applied. You must first delete the existing configuration if you want to choose a different name. clustername/packagename This variable is only used if the data replication is carried out by a separate software application that must be kept highly available. If the replication software uses a receiver process, you include this variable in the configuration file. Maximum size is 80 characters. The parameter consists of a pair of names: the name of the cluster that receives the data to be replicated (usually the Recovery Cluster) as defined in the Serviceguard cluster configuration ASCII file, followed by a slash (“/”), followed by the name of the data replication receiver package as defined in the Serviceguard package configuration ASCII file. Some replication software may only have a receiver package as separate package because the sender package is built into the application. clustername/packagename This variable is only used if the data replication is carried out by a separate software application that must be kept highly available. If the replication software uses a sender process, you include this variable in the configuration file. Maximum size is 80 characters. The parameter consists of a pair of names: the name of the cluster that sends the data to be replicated (usually the Primary Cluster) as defined in the Serviceguard cluster configuration ASCII file, followed by a slash (“/”), followed by the name of the data replication sender package as defined in the Serviceguard package Configuration File Parameters for Continentalclusters configuration ASCII file. Some replication software may only have a receiver package as separate package because the sender package is built into the application. MONITOR_INTERVAL n The interval, in seconds, that the Continentalclusters monitor polls the cluster, nodes, and packages to see if the status has changed. This number must be an integer. The minimum value is 30 seconds, the default is 60 seconds, and the maximum is 300 seconds (5 minutes). MONITOR_PACKAGE_NAME packagename MONITORING_CLUSTER This is the name of the Serviceguard package containing the Continentalclusters monitor. Maximum size is 31 bytes. Name This is name of the cluster that polls the cluster named in the CLUSTER_EVENT and sends notification. Maximum length is 31 bytes. This is the unqualified node name as defined in the DNS name server configuration. Maximum size is 31 bytes. This is a destination and message associated with NOTIFICATION Destination a specific CLUSTER_ALERT or CLUSTER_ALARM. “Message” The maximum size of the message string is 170 characters including the quotation marks. The message string must be entered on a separate single line in the configuration file. The following destinations are acceptable: • CONSOLE - write the specified message to the console. • EMAIL Address - send the specified message to an email address. You can use an email address provided by a paging service to set up automatic paging. Consult your pager service provider for details. • OPC Level - send the specified message to OpenView IT/Operations.The Levelmay be 8 (normal), 16 (warning), 64 (minor), 128 (major), or 32 (critical). • SNMP Level - send the specified message as an SNMP trap. The Levelmay be 1 (normal), 2 (warning), 3 (minor), 4 (major), or 5 (critical). • SYSLOG - Append a notice of the specified message to the NODE_NAME nodename 413 /var/adm/syslog/syslog.log file. Note that the text of the message is not placed in the syslog file, only a notice from the monitor. • TCP Nodename:Portnumber - send the specified message to a TCP port on the specified node. • TEXTLOG Pathname - append the specified message to a specified text log file. • UDP Nodename:Portnumber - send the specified message to a UDP port on the specified node. Any number of notifications may be associated with a given alert or alarm. PRIMARY_PACKAGE Clustername/Packagename This is a pair of names: the name of a cluster as defined in the Serviceguard cluster configuration ASCII file, followed by a slash (“/”), followed by the name of the primary package as defined in the Serviceguard package configuration ASCII file. Maximum size is 80 characters. RECOVERY_GROUP_NAME RECOVERY_PACKAGE name This is a name for the set of related primary packages on one cluster and the recovery packages on another cluster that protect the primary packages. The maximum size is 31 bytes. You create a recovery group for each package that should be started on the recovery cluster in case of a failure on the primary cluster. A PRIMARY_PACKAGE and RECOVERY_PACKAGE should be associated with each RECOVERY_GROUP_NAME. Clustername/Packagename This is a pair of names: the name of the recovery cluster as defined in the Serviceguard cluster configuration ASCII file, followed by a slash (“/”), followed by the name of the recovery package as defined in the Serviceguard package configuration ASCII file. Maximum size is 80 characters. CONTINENTAL_CLUSTER_STATE_DIR Absolute path to a file system where the Continentalclusters state data will be stored. The state data file system should be created on a shared disk in the cluster and specified as part of the monitor package configuration. The path 414 Configuration File Parameters for Continentalclusters specified here should be created in all nodes in the Continentalclusters. The monitor package control script should mount the file system at this specified path on the node it is started. This parameter is optional if the maintenance mode feature for recovery groups is not required. This parameter is mandatory if maintenance mode feature for recovery groups is required. REHEARSAL_PACKAGE ClusterName/PackageName This is a pair of names: the name of a cluster as defined in the Serviceguard cluster configuration ASCII file, followed by a slash ("/"), followed by the name of the rehearsal package as defined in the Serviceguard package configuration ASCII file. This variable is only used for rehearsal operations. This package is started on the recovery cluster by the cmrecovercl-r command. 415 416 E Continentalclusters Command and Daemon Reference This appendix lists all commands and daemons used with Continentalclusters. Manual pages are also available online. This command verifies the Continentalclusters cmapplyconcl [-v] [-C] configuration as specified in filename, creates filename or updates the binary, and distributes it to all nodes in the continental cluster. It is not necessary to halt the Serviceguard cluster in order to run this command; however, the Continentalclusters monitor package must be halted.If cmapplyconcl is specified when the continentalcluster has already been configured, the configuration will be updated with the configuration changes. Before updating Continentalclusters, all impacted recovery groups should be moved out of maintenance mode (i.e. enabled).The cmapplyconcl command must be run when a configuration change is made to the Serviceguard cluster that impacts the Continentalclusters configuration. For example, if a node is added to the Serviceguard cluster, the Continentalclusters ASCII file should be edited to include the new NODE_NAME.All nodes within the Serviceguard cluster must be running prior to the cmapplyconcl command being run. Options are: Verbose mode displays all -v messages. The name of the ASCII -C filename configuration file. This is a required parameter. cmcheckconcl [-v] -C filename This command verifies the Continentalclusters configuration specified in filename. It is not necessary to halt the Serviceguard cluster in order to run this command; however, the Continentalclusters monitor package must be halted. This command will parse the ASCII_file to ensure proper syntax, check parameter lengths, and validate object names such as the CLUSTER_NAME and NODE_NAME. Options are: 417 -C filename cmclrmond cmclsentryd cmdeleteconcl [-f] The name of the ASCII configuration file. This is a required parameter. This is the Continentalclusters monitor daemon that provides notification of remote cluster status through the Event Monitoring Service (EMS). This monitor runs on both the primary and recovery clusters. The cmclsentryd deamon notifies cmclrmond of any change in cluster status. Log messages are written to the EMS log file /etc/resmon/log/api.log on the node where the monitor was running when it detected a status event. This daemon, which is run from the monitor package (ccmonpkg) starts up the Continentalclusters monitor cmclrmond. Messages are logged to log file/var/adm/ cmconcl/sentryd.log, which may be read using the cmreadlog command. This command is used to delete the continentalclusters configuration from the entire Continentalclusters. This command will not remove the file system configured for recovery group maintenance mode feature. Options are: -f Delete the configuration files on all reachable nodes without further prompting. If this option is not used and if some nodes are unreachable, you will be prompted to indicate whether to proceed with deleting the configuration on the reachable nodes. If this option is used and some node has configuration files for a continental cluster with a different name, you will be prompted to indicate whether to proceed with deleting the configuration on that node. This command is used to force a cmforceconcl ServiceguardPackageEnableCommand Continentalclusters package to start. It allows a package to run even if the status of a remote package in the recovery group is unknown, which indicates that the software could not determine the status of the remote package. ServiceguardPackageEnableCommand is either a cmrunpkg or cmmodpkg command. 418 Continentalclusters Command and Daemon Reference cmomd cmqueryconcl filename This daemon is the Object Manager, which communicates with Serviceguard to provide information about cluster objects to the Continentalclusters monitor. Messages are logged to log file/var/opt/cmom/cmomd.log, which may be read using the cmreadlog command. This command cmqueryconcl creates a template ASCII Continentalclusters configuration file. The ASCII file should be customized for a specific Continentalclusters environment. After customization, this file should be verified by the cmcheckconcl command and distributed by using the cmapplyconcl command. If an ASCII file is not provided, output will be directed to stdout.This command should be run as the first step in preparing for Continentalclusters configuration. Options are: Verbose mode displays all -v messages. Declares an alternate location -C filename for the configuration file. The default is/etc/cmcluster/ cmoncl.config. cmreadlog -f input_file [output_file] This command formats the content of Object Manager and other log files for easier reading. The command is used when reading the /var/ opt/cmom/cmomd.log file and the /var/adm/ cmconcl/sentryd.log file. Options are: Specifies the name of the -f input_file managed object file (MOF file) to be read. This is a required parameter. The name of a file to which output_file the formatted output is written. If no file is specified, output is written to stdout. cmrecovercl [-f] This command performs the recovery actions necessary to start the recovery groups on current cluster. Care should be taken before issuing this command. It is important to contact the primary 419 cluster site to determine if recovery is necessary prior to running this command. This command will perform recovery actions only for recovery groups that are out of the maintenance mode (i.e. enabled). If the specified recovery group for -g option is in maintenance mode; the command will exit without recovering it. When -c option is used; the command will skip recovering recovery groups which are in the maintenance mode. This command can be issued from any node on the recovery cluster. This command first connects to the Continentalclusters monitoring package running on the recovery cluster. This may be a different cluster node than where the cmrecovercl command is being run. cmrecovercl connects to the monitoring package to verify that the primary cluster is in an Unreachable or Down state. If the primary cluster is reachable and the cluster is Up, this command will fail. Next, the data receiver packages on the recovery cluster (if any) are halted sequentially. Finally, the recovery packages are started on the recovery cluster. The recovery packages are started by enabling package switching globally (cmmodpkg -e) for each package. This will cause the package to be started on the first available node within the recovery cluster.The cmrecovercl command can only be run on a recovery cluster. The cmrecovercl command will fail if there has not been sufficient time since the primary cluster became unreachable. This command is only enabled after the time as configured via CLUSTER_ALARM parameters has been reached. Once a cluster alarm has been triggered, this command will be enabled and can be run. The -f option can be used to enable the command after the time as configured via CLUSTER_ALERT parameters has been reached. Options are: -f The force option enables cmrecovercl to function even though a CLUSTER_ALARM has not been received. 420 Continentalclusters Command and Daemon Reference cmrecovercl {-e | -d [f] }-g This command moves a recovery group in and out of the maintenance mode by disabling or enabling it. This command should be run only on the recovery cluster. Options are: Moves a recovery group out of the -e maintenance mode by enabling it. -d [-f] Moves a recovery group into the maintenance mode by disabling it. Use the -f option to forcefully move a recovery group into the maintenance mode when the primary cluster status is unknown or unreachable. cmrecovercl [-r]-g This command starts the rehearsal for the specified recovery group. This command should be run only on the recovery cluster. This command will fail if the specified recovery group is not in the maintenance mode. This command allows you to view the status and much of the configuration of a continental cluster.This command should be run as the last step when creating a Continentalclusters configuration to confirm the cluster status, or any time you would like to know cluster status. Options are: -v Verbose mode displays all messages. cmviewconcl [-v] 421 422 F Metrocluster Command Reference for Preview Utility This appendix describes the Data Replication Storage Failover Preview utility. This appendix discusses the following topics: • Overview of Data Replication Storage Failover Preview • Command Reference • Sample Output of the cmdrprev Command Overview of Data Replication Storage Failover Preview In an actual failure, packages are failed over to the standby site. As part of the package startup, the underlying storage is failed over based on the parameters defined in Metrocluster environment file. The storage failover can fail due to many reasons and can be categorized as the following: • Incorrect configuration or setup of Metrocluster and data replication environment The storage failover can fail if the Metrocluster Environment file contains syntax errors, or invalid parameter values, or the installed Metrocluster binaries are corrupt or have incorrect file permissions. Also in the case of Continuous Access EVA, the SMI-S service on the WMS could be down or the mccaeva.conf file is corrupt or stale. In the case of Continuous Access XP, the possible configuration errors are invalid horcm file, or the command device is not functional. • Invalid data replication state The data may not be in the write-order, which can be due to a track copy at the time of the failover attempt. Also, the data is not current (lagging behind the primary) and the Metrocluster environment variables are not set correctly to allow a failover on non-current data. The command cmdrprev previews the failover of data replication storage. It determines if storage failover would be successful in an actual package failover scenario. This command can be used in both Metrocluster and Continentalcluster installations. If the preview fails, the cmdrprev command displays on stdout a detailed log that identifies the cause for failure. NOTE: Detailed messages from the preview process are logged into the stdout. The command exit value indicates if the storage failover in an actual package will succeed or not. Table F-1 describes the exit values of the command. Overview of Data Replication Storage Failover Preview 423 Table F-1 Command Exit Value and its Description Value Description -1 The data replication storage failover preview failed. This indicates that in the event of an actual recovery process, the data replication storage failover will not succeed from on any node in the cluster. The data replication storage failover preview command failed, due to invalid command usage or due to invalid input parameters. 0 The data replication storage failover preview is successful from a specific node. This indicates if data replication storage failover will be successful in the event of a package failover. 1 The data replication storage failover preview failed. This indicates that in the event of an actual recovery process, the data replication storage failover will not succeed from on any node in the cluster. 2 The data replication storage failover preview failed due to node specific error conditions or due to transient conditions. This indicates that in the event of an actual recovery process, the data replication storage failover will not succeed on that node. Failover may be successful if attempted at a later time or attempted on a different node in the cluster. Command Reference This sections lists the Metrocluster commands that can be used for the preview utility. cmdrprev {-p |-e } The command previews the failover of data replication device group that indicates if the device group failover will be successful or not in event of an actual package failover. The command parses the Metrocluster Environment file associated to the or which is provided as the input parameter with the -e option. It ensures proper syntax, and verifies for valid parameter values. The command then previews the failover of the device group configured in the Metrocluster environment file. The storage preparation is based on the variables defined in the Metrocluster environment file. When run on the site where the primary storage is connected, the command previews the storage preparation for local package failover. However, when run on the site where secondary storage is connected, the command previews storage failover for remote package failover. The command does not change the data replication environment and can be used even if the package associated with the data replication storage is up on the cluster. If the FORCEFLAG is present in the package directory, this command will remove it. It is name of the package for which the failover -p package of data replication device group is previewed. The command uses the Metrocluster environment 424 Metrocluster Command Reference for Preview Utility file present in the package directory for the preview. The name of Metrocluster environment file present in the package directory has to be based on the naming convention. It is necessary for the package to be configured in the cluster in order to use this option. -e Metrocluster Environment It is the name of the Metrocluster environment file which is used by command to preview the File failover of data replication storage. Use this option when the Metrocluster environment file is outside the package directory. Sample Output of the cmdrprev Command The following procedure shows you how to use the cmdrprev command to preview the data replication preparation for a package in an MC/SRDF environment. 1. Verify that the Metrocluster environment file for the package pkga is present in the package directory on node dtsia14. 2. Verify that the environment variables are all set to the default values as shown below. AUTOSWAPR2=0 AUTOR1RWNL=0 AUTOR1UIP=1 AUTOR1RWSUSP=0 AUTOR2RWNL=1 AUTOR2WDNL=1 AUTOR2XXNL=0 AUTOSPLITR1=0 CONSISTENCYGROUPS=0 DEVICE_GROUP="pkga_dg" PKGDIR="/etc/cmcluster/pkga" RETRY=60 RETRYTIME=5 CLUSTER_TYPE="metrocluster" RDF_MODE="sync" 3. Run the following command on dtsia14: $> /usr/sbin/cmdrprev -p pkga Following is the output that is displayed: May 10 04:36:40 - Node dtsia14: Entering Data Replication Enabler May 10 04:36:41 - Node dtsia14: Data Replication State: pkga Role: R2 LocalRdfState: WD RemoteRdfState: RW Sample Output of the cmdrprev Command 425 RDF Mode: SYNC RepState: SYNCHRONIZED May 10 04:36:43 - Node dtsia14: Failover of the device group successfully. The package is allowed to start up. May 10 04:36:43 - Node dtsia14: Leaving Data Replication Enabler 4. Verify that the cmdrprev command returns the value of 0. The 0 value indicates that the data replication preparation will be successful in the event of a package failover from this node. 426 Metrocluster Command Reference for Preview Utility G Data Replication Rehearsal in a Sample Environment This appendix describes how to set up and run data replication (DR) rehearsal with the example of a single instance Oracle application with Continentalcluster with Continuous Access XP integration. For additional examples of setting up and running DR rehearsal in different environments, see Disaster Recovery Rehearsal in Continentalclusters white paper at http:// www.docs.hp.com. This appendix discusses the following topics: • Setup Environment • Rehearsing Failure for a Single Instance Application Setup Environment This section describes the setup required to rehearse failure of the recovery group billing_recgp which is configured with Cluster1/Cluster2 as the recovery pair where Cluster1 is the primary cluster and Cluster2 is the recovery cluster. It is assumed that Continentalclusters is configured with the recovery groups and that the primary packages are up at the primary cluster. To set up Continentalclusters for a DR rehearsal process, you need to make the following changes in your environment: • Set up the continentalcluster state directory on the monitor package non-replicated shared disk. • Configure the Continentalclusters monitor package script to mount FS /opt/ cmconcl/statedir on the non-replicated shared disk. You need to make these changes only once to prepare your environment for DR rehearsal. For this example, it is assumed that you have already made these configuration changes in your environment. Device Group Configuration Changes To prevent failure of primary packages during rehearsal (where replication is suspended), you must disable the domino mode for device groups in an SRDF environment. However, in a Continuous Access XP environment, you need not disable the Fence=data setting. Rehearsal Package Configuration To rehearse failure for the recovery group billing_recgp, you must configure a non-metrocluster type package. This package must be different from the recovery package with a separate package directory with its own package configuration and control file. Configure the rehearsal package with the volume group and the file system mount point that you have configured for the recovery package. In addition, to prevent split brain scenarios, it is recommended that you configure the rehearsal package with an IP address that is different from the one that is configured for the recovery package. Setup Environment 427 Since Serviceguard does not allow two different packages to be configured with the same SERVICE_NAME, the SERVICE_NAME parameter and the SERVICE_CMD parameter for the rehearsal package has to be different. It is recommended that you prepare the rehearsal package configuration by cloning the recovery package configuration and then changing the relevant configuration required for rehearsal. Also, the Metrocluster environment file must not be present in the rehearsal package directory on any node. Distribute the rehearsal control scripts on all the nodes of the recovery cluster and apply the rehearsal package configuration on the recovery cluster. Primary Package Metrocluster Environment File If Continentalclusters is configured with MC/SRDF, then before starting rehearsal, set the AUTOSPLITR1 variable to 1. This ensures that during rehearsal, the primary packages continue to be highly available at the primary cluster. However, for Continentalclusters with Continuous Access XP integration, no changes to the primary package are required. Continentalclusters Configuration In this example, Continentalclusters is configured with the rehearsal package name billing_rhpkg used for rehearsing the recovery group billing_recgp. In the recovery group section, configure the REHEARSAL_PACKAGE field for recovery groups, billing_recgp with the rehearsal package names billing_rhpkg. Apply the Continentalclusters configuration information using the cmapplyconcl command from any cluster. NOTE: As the cluster level setup is already done, the Continentalcluster state directory name /opt/cmconcl/statedir used on the clusters is already configured. Rehearsing Failure for a Single Instance Application This sections describes the procedure to rehearse failure of a specific recovery group. The following steps can be executed from any of the recovery cluster nodes. It is assumed that the primary package billing_primpkg is up on Cluster1 and replication is in progress. 1. Verify the data replication environment. Use the cmdrprev command to preview the data replication environment preparation for an actual recovery. Run the following command on every node in the recovery cluster to determine if the return value is always 0: #/usr/sbin/cmdrprev -p billing_recpkg 2. 428 Move the recovery group billing_recgp into maintenance mode: # cmrecovercl -d -g billing_recgp Data Replication Rehearsal in a Sample Environment 3. Verify that the recovery group billing_recgp is in maintenance mode: # cmviewconcl -v 4. Prepare the replication environment for DR rehearsal: a. Manually suspend replication and enable write access to recovery disk, for disk group configured for the package: $) pairsplit -g billing_dg -rw b. For each volume group configured on the recovery disk, change the cluster ID by running the following command from any of the recovery cluster nodes: # vgchange -c n billing_vg # vgchange -c y billing_vg c. Split the business recovery copy pair, at recovery cluster, which is configured for the package: $) export HORCC_MRCF=1 $) pairsplit -g billing_dg $) unset HORCC_MRCF 5. Start the rehearsal: $) cmrecovercl -r -g billing_recgp NOTE: You can use the cmviewcl command to verify that the rehearsal package billing_rhpkg was started. 6. Stop rehearsal package: $) cmhaltpkg billing_rhpkg 7. Restore replication environment for recovery. You must first re-synchronize the recovery disk from the primary disk and then synchronizing the recovery disk business recovery copy with the recovery disk. $) pairresync -g billing _dg $) export HORCC_MRCF=1 $) pairresync -g billing _dg $) unset HORCC_MRCF 8. Move the recovery group billing_recgp out of maintenance mode: $) cmrecovercl -e -g billing_recgp Rehearsing Failure for a Single Instance Application 429 NOTE: You can use the cmviewconcl command to verify that the recovery group billing_recgp is not in the maintenance mode. For more information on performing a rehearsal with different configuration environments, see Disaster Recovery Rehearsal in Continentalclusters white paper at http:// www.docs.hp.com. 430 Data Replication Rehearsal in a Sample Environment H Site Aware Disaster Tolerant Architecture Configuration Work Sheet This appendix includes the worksheets that you need to use while configuring Site Aware Disaster Tolerant Architecture in your environment. Metrocluster Site Configuration Table H-1 Site Configuration Item Site Site Site Physical Location Name of the location Site Name One word name for the site that will be used in configurations Node Names 1) 2) 1) 2) Name of the nodes to be used for configurations 1st Heart Beat Subnet IP IP address of the node on the 1st Serviceguard Heart Beat Subnet 2nd Heart Beat Subnet IP IP address of the node on the 2nd Serviceguard Heart Beat Subnet Metrocluster Site Configuration 431 Replication Configuration Table H-2 Replication Configuration Item Data Replication RAID Device Group Name Name of the Continuous Access device group (dev_group) Sites Name of the sites Disk Array Serial # Serial Number of Disk Arrays at each site Node Names Name of Nodes at each site Command Device on Nodes Raw device file path at each node Device group device name Site 1 LUN Site 2 LUN Specify luns in CU:LDEV format Specify luns in CU:LDEV format Dev_name parameter 1) 2) 3) 4) 5) 6) 7) 8) 432 Site Aware Disaster Tolerant Architecture Configuration Work Sheet Table H-2 Replication Configuration (continued) Item Data 9) 10) CRS Sub-cluster Configuration – using CFS Table H-3 Configuring a CRS Sub-cluster using CFS Item Site Site CRS Sub Cluster Name Name of the CRS cluster CRS Home Local FS Path for CRS HOME CRS Shared Disk Group name CVM disk group name for CRS shared disk CRS cluster file system mount point Mount point path where the vote and OCR will be created CRS Vote Disk Path to the vote disk or file CRS OCR Disk Path to the OCR disk or file CRS DG MNP package Path to the OCR disk or file CRS MP MNP package Path to the OCR disk or file CRS MNP package Path to the OCR disk or file CRS Sub-cluster Configuration – using CFS 433 Table H-3 Configuring a CRS Sub-cluster using CFS (continued) Item Site Site CRS Member Nodes Node Names Private IP IP addresses for RAC Interconnect Private IP names IP address names for RAC Interconnect Virtual IP IP addresses for RAC VIP Virtual IP names IP addresses names for RAC VIP RAC Database Configuration Table H-4 RAC Database Configuration Property Database Name Name of the database Database Instance Names Instance names for the database RAC data files file system mount point Mount Point for oracle RAC data files RAC data files CVM Disk group name CVM Disk Group name for oracle RAC data files file system 434 Site Aware Disaster Tolerant Architecture Configuration Work Sheet Value Table H-4 RAC Database Configuration (continued) Property Value RAC flash files file system mount point. Mount Point for oracle RAC flash RAC flash files CVM Disk group name CVM Disk Group name for oracle RAC flash file system Entry Site Site RAC Home Local file system directory to install Oracle RAC RAC MNP Package name for RAC database RAC Data file DG MNP CFS DG MNP package name for RAC data files file system RAC Data file MP MNP CFS MP MNP package name for RAC data files file system RAC Flash Area DG MNP CFS DG MNP package name for RAC flash file system RAC Flash Area MP MNP CFS MP MNP package name for RAC flash file system Node Names Database Instance Names RAC Database Configuration 435 Site Controller Package Configuration Table H-5 Site Controller Package Configuration PACKAGE_NAME Name of the Site Controller Package Site Safety Latch /dts/mcsc/ Name of the EMS resource name. The format is /dts/mcsc/ 436 Site critical_package managed_package value for the site attribute values for the critical_package attribute in this site values for the managed_pacakge attribute in this site 1) 1) 2) 2) 3) 3) 4) 4) 1) 1) 2) 2) 3) 3) 4) 4) Site Aware Disaster Tolerant Architecture Configuration Work Sheet Glossary A application restart Starting an application, usually on another node, after a failure. Application can be restarted manually, which may be necessary if data must be restarted before the application can run (example: Business Recovery Services work like this.) Applications can by restarted by an operator using a script, which can reduce human error. Or applications can be started on the local or remote site automatically after detecting the failure of the primary site. arbitrator Nodes in a disaster tolerant architecture that act as tie-breakers in case all of the nodes in a data center go down at the same time. These nodes are full members of the Serviceguard cluster and must conform to the minimum requirements. The arbitrator must be located in a third data center to ensure that the failure of an entire data center does not bring the entire cluster down. See also quorum server. asymmetrical cluster A cluster that has more nodes at one site than at another. For example, an asymmetrical metropolitan cluster may have two nodes in one building, and three nodes in another building. Asymmetrical clusters are not supported in all disaster tolerant architectures. asynchronous data replication Local I/O will complete without waiting for the replicated I/O to complete; however, it is expected that asynchronous data replication will process the I/Os in the original order. automatic failover Failover directed by automation scripts or software (such as Serviceguard) and requiring no human intervention. In a Continentalclusters environment, the start-up of package recovery groups on the Recovery Cluster without intervention. See also application restart. B BC (Business Copy) A PVOL or SVOL in an HP StorageWorks XP series disk array that can be split from or merged into a normal PVOL or SVOL. It is often used to create a snapshot of the data taken at a known point in time. Although this copy, when split, is often consistent, it is not usually current. BCV (Business Continuity Volume) An EMC Symmetrix term that refers to a logical device on the EMC Symmetrix that may be merged into or split from a regular R1 or R2 logical device. It is often used to create a snapshot of the data taken at a known point in time. Although this copy, when split, is often consistent, it is not usually current. bi-directional configuration A continental cluster configuration in which each cluster serves the roles of primary and recovery cluster for different recovery groups. Also known as a mutual recovery configuration. Business Recovery Service Service provided by a vendor to host the backup systems needed to run mission critical applications following a disaster. C campus cluster A single cluster that is geographically dispersed within the confines of an area owned or leased by the organization such that it has the right to run cables above or below ground between buildings in the campus. Campus clusters are usually spread out in different rooms in a single building, or in different adjacent or nearby buildings. See also extended distance cluster. 437 cascading failover Cascading failover is the ability of an application to fail from a primary to a secondary location, and then to fail to a recovery location on a different site. The primary location contains a metropolitan cluster built with Metrocluster EMC SRDF, and the recovery location has a standard Serviceguard cluster. client reconnect Users access to the backup site after failover. Client reconnect can be transparent, where the user is automatically connected to the application running on the remote site, or manual, where the user selects a site to connect to. cluster An Serviceguard cluster is a networked grouping of HP 9000 and/or HP Integrity Servers series 800 servers (host systems known as nodes) having sufficient redundancy of software and hardware that a single failure will not significantly disrupt service. Serviceguard software monitors the health of nodes, networks, application services, EMS resources, and makes failover decisions based on where the application is able to run successfully. cluster alarm Time at which a message is sent indicating that the cluster is probably in need of recovery. The cmrecoverclcommand is enabled at this time. cluster alert Time at which a message is sent indicating a problem with the cluster. cluster event A cluster condition that occurs when the cluster goes down or enters an UNKNOWN state, or when the monitor software returns an error. This event may cause an alert messages to be sent out, or it may cause an alarm condition to be set, which allows the administrator on the Recovery Cluster to issue the cmrecovercl command. The return of the cluster to the UP state results in a cancellation of the event, which may be accompanied by a cancel event notice. In addition, the cancellation disables the use of the cmrecovercl command. cluster quorum A dynamically calculated majority used to determine whether any grouping of nodes is sufficient to start or run the cluster. Cluster quorums prevent split-brain syndrome which can lead to data corruption or inconsistency. Currently at least 50% of the nodes plus a tie-breaker are required for a quorum. If no tie-breaker is configured, then greater than 50% of the nodes is required to start and run a cluster. command device A disk area in the HP StorageWorks XP series disk array used for internal system communication. You create two command devices on each array, each with alternate links (PV links). consistency group A set of Symmetrix RDF devices that are configured to act in unison to maintain the integrity of a database. Consistency groups allow you to configure R1/R2 devices on multiple Symmetrix frames in Metrocluster with EMC SRDF. continental cluster A group of clusters that use routed networks and/or common carrier networks for data replication and cluster communication to support package failover between separate clusters in different data centers. Continental clusters are often located in different cities or different countries and can span 100s or 1000s of kilometers. Continuous Access A facility provided by the Continuos Access software option available with the HP StorageWorks E Disk Array XP series. This facility enables physical data replication between XP series disk arrays. D data center A physically proximate collection of nodes and disks, usually all in one room. data consistency Whether data are logically correct and immediately usable; the validity of the data after the last write. Inconsistent data, if not recoverable to a consistent state, is corrupt. data currency Whether the data contain the most recent transactions, and/or whether the replica database has all of the committed transactions that the primary database contains; speed of data replication may cause the replica to lag behind the primary copy, and compromise data currency. 438 Glossary data loss The inability to take action to recover data. Data loss can be the result of transactions being copied that were lost when a failure occurred, non-committed transactions that were rolled back as pat of a recovery process, data in the process of being replicated that never made it to the replica because of a failure, transactions that were committed after the last tape backup when a failure occurred that required a reload from the last tape backup. transaction processing monitors (TPM), message queuing software, and synchronous data replication are measures that can protect against data loss. data mirroring See See mirroring.. data recoverability The ability to take action that results in data consistency, for example database rollback/roll forward recovery. data replication The scheme by which data is copied from one site to another for disaster tolerance. Data replication can be either physical (see physical data replication) or logical (see logical data replication). In a Continentalclusters environment, the process by which data that is used by the cluster packages is transferred to the Recovery Cluster and made available for use on the Recovery Cluster in the event of a recovery. database replication A software-based logical data replication scheme that is offered by most database vendors. disaster An event causing the failure of multiple components or entire data centers that render unavailable all services at a single location; these include natural disasters such as earthquake, fire, or flood, acts of terrorism or sabotage, large-scale power outages. disaster protection (Don’t use this term?) Processes, tools, hardware, and software that provide protection in the event of an extreme occurrence that causes application downtime such that the application can be restarted at a different location within a fixed period of time. disaster recovery The process of restoring access to applications and data after a disaster. Disaster recovery can be manual, meaning human intervention is required, or it can be automated, requiring little or no human intervention. disaster recovery services Services and products offered by companies that provide the hardware, software, processes, and people necessary to recover from a disaster. disaster tolerant The characteristic of being able to recover quickly from a disaster. Components of disaster tolerance include redundant hardware, data replication, geographic dispersion, partial or complete recovery automation, and well-defined recovery procedures. disaster tolerant architecture A cluster architecture that protects against multiple points of failure or a single catastrophic failure that affects many components by locating parts of the cluster at a remote site and by providing data replication to the remote site. Other components of disaster tolerant architecture include redundant links, either for networking or data replication, that are installed along different routes, and automation of most or all of the recovery process. E, F ESCON Enterprise Storage Connect. A type of fiber-optic channel used for inter-frame communication between EMC Symmetrix frames using EMC SRDF or between HP StorageWorks E XP series disk array units using Continuous Access XP. event log The default location (/var/opt/resmon/log/cc/eventlog) where events are logged on the monitoring Continentalclusters system. All events are written to this log, as well as all notifications that are sent elsewhere. 439 extended distance cluster A cluster with alternate nodes located in different data centers separated by some distance. Formerly known as campus cluster. failback Failing back from a backup node, which may or may not be remote, to the primary node that the application normally runs on. failover The transfer of control of an application or service from one node to another node after a failure. Failover can be manual, requiring human intervention, or automated, requiring little or no human intervention. filesystem replication The process of replicating filesystem changes from one node to another. G gatekeeper A small EMC Symmetrix device configured to function as a lock during certain state change operations. H, I heartbeat network A network that provides reliable communication among nodes in a cluster, including the transmission of heartbeat messages, signals from each functioning node, which are central to the operation of the cluster, and which determine the health of the nodes in the cluster. high availability A combination of technology, processes, and support partnerships that provide greater application or system availability. J, K, L local cluster A cluster located in a single data center. This type of cluster is not disaster tolerant. local failover Failover on the same node; this most often applied to hardware failover, for example local LAN failover is switching to the secondary LAN card on the same node after the primary LAN card has failed. logical data replication A type of on-line data replication that replicates logical transactions that change either the filesystem or the database. Complex transactions may result in the modification of many diverse physical blocks on the disk. LUN (Logical Unit Number) A SCSI term that refers to a logical disk device composed of one or more physical disk mechanisms, typically configured into a RAID level. M M by N A type of Symmetrix grouping in which up to two Symmetrix frames may be configured on either side of a data replication link in a Metrocluster with EMC SRDF configuration. M by N configurations include 1 by 2, 2 by 1, and 2 by 2. Maintenance mode A recovery group is in the maintenance mode when it is disabled. The cmrecovercl -dcommand moves a recovery group is moved into maintenance mode. The cmrecovercl -e command moved the recovery group out of the maintenance mode. When a recovery group is in the maintenance mode, recovery is not allowed. manual failover Failover requiring human intervention to start an application or service on another node. 440 Glossary Metrocluster A Hewlett-Packard product that allows a customer to configure an Serviceguard cluster as a disaster tolerant metropolitan cluster. metropolitan cluster A cluster that is geographically dispersed within the confines of a metropolitan area requiring right-of-way to lay cable for redundant network and data replication components. mirrored data Data that is copied using mirroring. mirroring Disk mirroring hardware or software, such as MirrorDisk/UX. Some mirroring methods may allow splitting and merging. mission critical application Hardware, software, processes and support services that must meet the uptime requirements of an organization. Examples of mission critical application that must be able to survive regional disasters include financial trading services, e-business operations, 911 phone service, and patient record databases. mission critical solution The architecture and processes that provide the required uptime for mission critical applications. multiple points of failure (MPOF) More than one point of failure that can bring down an Serviceguard cluster. multiple system high availability Cluster technology and architecture that increases the level of availability by grouping systems into a cooperative failover design. mutual recovery configuration A continental cluster configuration in which each cluster serves the roles of primary and recovery cluster for different recovery groups. Also known as a bi-directional configuration. N network failover The ability to restore a network connection after a failure in network hardware when there are redundant network links to the same IP subnet. notification A message that is sent following a cluster or package event. O off-line data replication. Data replication by storing data off-line, usually a backup tape or disk stored in a safe location; this method is best for applications that can accept a 24-hour recovery time. on-line data replication Data replication by copying to another location that is immediately accessible. On-line data replication is usually done by transmitting data over a link in real time or with a slight delay to a remote site; this method is best for applications requiring quick recovery (within a few hours or minutes). P cluster A cluster in production that has packages protected by the HP Continentalclusters product. package alert Time at which a message is sent indicating a problem with a package. package event A package condition such as a failure that causes a notification message to be sent. Package events can be accompanied by alerts, but not alarms. Messages are for information only; the cmrecovercl command is not enabled for a package event. 441 package recovery group A set of one or more packages with a mapping between their instances on the cluster and their instances on the Recovery Cluster. physical data replication An on-line data replication method that duplicates I/O writes to another disk on a physical block basis. Physical replication can be hardware-based where data is replicated between disks over a dedicated link (for example EMC’s Symmetrix Remote Data Facility or the HP StorageWorks E Disk Array XP Series Continuous Access), or software-based where data is replicated on multiple disks using dedicated software on the primary node (for example, MirrorDisk/UX). planned downtime An anticipated period of time when nodes are taken down for hardware maintenance, software maintenance (OS and application), backup, reorganization, upgrades (software or hardware), etc. PowerPath A host-based software product from Symmetrix that delivers intelligent I/O path management. PowerPath is used for M by N Symmetrix configurations using Metrocluster with EMC SRDF. primary package The package that normally runs on the cluster in a production environment. pushbutton failover Use of the cmrecovercl command to allow all package recovery groups to start up on the Recovery Cluster following a significant cluster event on the cluster. PV links A method of LVM configuration that allows you to provide redundant disk interfaces and buses to disk arrays, thereby protecting against single points of failure in disk cards and cables. PVOL A primary volume configured in an XP series disk array that uses Continuous Access. PVOLs are the primary copies in physical data replication with Continuos Access on the XP. Q quorum See See cluster quorum.. quorum server A cluster node that acts as a tie-breaker in a disaster tolerant architecture in case all of the nodes in a data center go down at the same time. See also arbitrator. R R1 The Symmetrix term indicating the data copy that is the primary copy. R2 The Symmetrix term indicating the remote data copy that is the secondary copy. It is normally read-only by the nodes at the remote site. RDF-ECA (RDF Enginuity Consistency Assist) A Solutions Enabler feature to provide consistency protection for SRDF/Synchronous devices. RDF-ECA is used for M by N Symmetrix configurations using Metrocluster with EMC SRDF. Recovery Cluster A cluster on which recovery of a package takes place following a failure on the cluster. recovery group failover A failover of a package recovery group from one cluster to another. recovery package The package that takes over on the Recovery Cluster in the event of a failure on the cluster. regional disaster A disaster, such as an earthquake or hurricane, that affects a large region. Local, campus, and proximate metropolitan clusters are less likely to protect from regional disasters. rehearsal package The recovery cluster package used to validate the recovery environment and procedure as part of a rehearsal operation. 442 Glossary remote failover Failover to a node at another data center or remote location. resynchronization The process of making the data between two sites consistent and current once systems are restored following a failure. Also called data resynchronization. rolling disaster A second disaster that occurs before recovering from a previous disaster, for example, while data is being synchronized between two data centers after a disaster, one of the data centers fails, interrupting the data synchronization process. Rolling disasters may result in data corruption that requires a reload from tape backups. S single point of failure (SPOF) A component of a cluster or node that, if it fails, affects access to applications or services. See also multiple points of failure. single system high availability Hardware design that results in a single system that has availability higher than normal. Hardware design examples are: • n+1 fans • n+1 power supplies • multiple power cords • on-line addition or replacement of I/O cards, memory, etc. special device file The device file name that the HP-UX operating system gives to a single connection to a node, in the format /dev/devtype/filename. split-brain syndrome When a cluster reforms with equal numbers of nodes at each site, and each half of the cluster thinks it is the authority and starts up the same set of applications, and tries to modify the same data, resulting in data corruption. Serviceguard architecture prevents split-brain syndrome in all cases unless dual cluster locks are used. SRDF (Symmetrix Remote Data Facility) A level 1-3 protocol used for physical data replication between EMC Symmetrix disk arrays. SVOL A secondary volume configured in an XP series disk array that uses Continuous Access. SVOLs are the secondary copies in physical data replication with Continuos Access on the XP. SymCLI The Symmetrix command line interface used to configure and manage EMC Symmetrix disk arrays. Symmetrix device number The unique device number that identifies an EMC logical volume. synchronous data replication Each data replication I/O waits for the preceding I/O to complete before beginning another replication. Minimizes the chance of inconsistent or corrupt data in the event of a rolling disaster. T transaction processing monitor (TPM) Software that allows you to modify an application to store in-flight transactions in an external location until that transaction has been committed to all possible copies of the database or filesystem, thus ensuring completion of all copied transactions. A TPM protects against data loss at the expense of the CPU overhead involved in applying the transaction in each database replica. Software that provides a reliable mechanism to ensure that all transactions are successfully committed. A TPM may also provide load balancing among nodes. 443 transparent failover A client application that automatically reconnects to a new server without the user taking any action. transparent IP failover Moving the IP address from one network interface card (NIC), in the same node or another node, to another NIC that is attached to the same IP subnet so that users or applications may always specify the same IP name/address whenever they connect, even after a failure. U-Z volume group In LVM, a set of physical volumes such that logical volumes can be defined within the volume group for user access. A volume group can be activated by only one node at a time unless you are using Serviceguard OPS Edition. Serviceguard can activate a volume group when it starts a package. A given disk can belong to only one volume group. A logical volume can belong to only one volume group. WAN data replication solutions Data replication that functions over leased or switched lines. See also continental cluster. 444 Glossary Index A critical_package, 328 CVM/CFS, 115 adding a node to Continentalclusters configuration, 108 arbitrator nodes, 29, 30 D C cluster calculating quorum, 30 continental, 51 extended distance, 25 metropolitan, 25 recovery, 62 clusterware, 126 cmdeleteconcl command, 114 cmrecovercl, 95 command line cmrecovercl, 95 symdg, 237 command line interface, EMC Symmetrix, 230 configuring a three-data-center architecture, 28 additional nodes in Continentalclusters, 108 arbitrator nodes, 29 configuring, 59 Continentalcluster Recovery cluster hardware, 62 Continentalclusters recovery cluster, 62 data replication for Continentalclusters, 59 gatekeeper devices, 238 monitoring in Continentalclusters monitor packages in Continentalclusters, 67 Primary cluster, 60 verifiying EMC Symmetrix configuration, 239 verifiying XP series configuration, 200 configuring continental clusters, 71 configuring for Continentalclusters, 59 configuring MetroCluster, 250 continental cluster, 51, 59 deleting, 114 renaming, 115 Continentalclusters, 37, 117 checking status, 110 configuration file, 71 installing, 65 log files, 113 monitor package, 67 monitoring, 69 Recovery Group Rehearsal, 47 switching clusters, 93 Continentalclusters overview, 59 creating EMC Symmetrix device groups, 237 volume groups, 205, 239 data center one, 26 three, 26 data replication, 59 over WAN, 52 restoring after a disaster, 99 DC1HOST package control script variables, 159, 173, 177, 220 deleting a continental cluster, 114 device groups creating, 237 device names EMC Symmetrix logical devices, 238 mapping, 232 mapping Symmetrix to command line symld, 238 mapping Symmetrix to HP-UX, 235 device names, EMC Symmetrix, 231 devices gatekeeper, 238 Disaster Recovery Performing, 47 disaster recovery automating with MetroCluster, 250 using Continentalclusters, 93 disaster tolerance restoring to Continentalclusters, 99 disaster tolerant Continentalclusters architecture, 51 Continentalclusters worksheet, 57 Metrocluster hardware, 33 MetroCluster supported architectures, 25 Metrocluster worksheet, 33 package worksheet, 34 disk command line interface, 230 device names, 231 serial number, 232 distributing MetroCluster configuration, 160, 173, 177, 212, 220 E EMC SRDF with Continentalclusters Continentalcluster with EMC SRDF, 59 EMC Symmetrix creating device groups, 237 device names, 231 gatekeeper devices, 238 serial number, 232 445 SymCLI database, 230 verify configuration, 239 EMC Symmetrix logical device names, 238 exporting volume groups, 206, 239 extended distance cluster, 25 G gatekeeper devices, 238 H hardware for Continentalcluster Recovery cluster, 62 I importing volume groups, 206, 239 installing Continentalclusters product, 65 L log files reviewing in Continentalclusters, 113 logical device names, EMC Symmetrix, 238 M Maintenance Mode cmrecovercl -d -g, 44 managed_package, 328 mapping EMC Symmetrix and HP-UX devices, 235, 238 mapping Symmetrix and HP-UX devices, 232 MC/ServiceGuard, package configuration, 159, 173, 177, 220 MetroCluster, 230 configuring, 250 Metrocluster, 227 Metrocluster/CA, 133, 185 metropolitan cluster, 25 modifying Continentalclusters configuration file, 71 monitoring, 88 receiving Continentalclusters notification, 94 sample package configuration file for Continentalclusters, 69 N network redundant, 26 node adding to Continentalclusters, 108 nodes allowed in three-data-center architecture, 28 arbitrator nodes, 29 notifications receiving, 94 446 Index O operations staff for Continentalclusters, 57 Oracle Cluster Registry, 325 Oracle Cluster Software/CRS, 126 Oracle Clusterware, 126 Oracle clusterware sub-cluster, 325 Oracle Database Configuration Assistant, 352 Oracle NETCA, 348 Oracle RAC, 117 Oracle RAC instances, 115 P package for Continentalclusters monitoring, 69 worksheet, 34 package control script distributing MetroCluster script to nodes, 160, 173, 177, 212, 220 package switching via cmrecovercl command, 95 pairresync, 164 planning continental cluster, 57 post.cmapply script, 254 power planning worksheet, 57, 58 Primary cluster configuring, 60 Q quorum, 30 R RAC, 115, 117 recovery cluster, 62 Recovery Groups configuring with rehearsal package, 64 Maintenance Mode, 44 rehearsal package IP address, 50 renaming a continental cluster, 115 replicating data over WAN, 52 required software, 230 restoring SRDF links, 254 RETRY, 253 RETRYTIME, 253 RUN_SCRIPT_TIMEOUT, 251 S sample scripts, 229 scripts automating configuration, 229 post.cmapply for MetroCluster, 254 serial number, EMC Symmetrix, 232 Serviceguard with Continentalclusters, 37 with Metrocluster, 227 with Metrocluster/CA, 133, 185 Shared Disk configuration, 69 single data center, 26 Site Aware Disaster Tolerant Architecture, 323 Site Controller package, 327 site coordination for Continentalclusters notification, 94 special device files, mapping to EMC Symmetrix, 235 splitting SRDF links during volume group configuration, 240 SRDF links restoring, 254 splitting, 240 starting for Continentalclusters, 88 starting the monitor packaing in Continentalclusters, 88 status checking status of Continentalclusters objects, 110 switching to a recovery cluster using Continentalclusters, 93 SymCLI database, 230 symdg, 237 symld command, 238 T testing Continentalclusters configuration, 89 three-data-center architecture, 25 V Veritas, 115 Veritas Cluster Volume Manager/Cluster File System, 115, 117 Veritas CVM/CFS, 117 volume groups creating, 205, 239 importing and exporting, 206, 239 Voting disks, 343 W worksheet Continentalclusters, 57 power supply configuration, 57, 58 worksheet, Metrocluster, 33 worksheet, package, 34 X XP series verify configuration, 200 447