Patching Cluster with HA-Zones

From Tom
Jump to: navigation, search

This documentation can be redistributed and/or modified under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

Unless required by applicable law, this documentation is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

This documentation should not be used as a replacement for a valid Oracle service contract and/or an Oracle service engagement. Failure to follow Oracle guidelines for installation and/or maintenance could result in service/warranty issues with Oracle.

Use of this documentation is at your own risk!

--Tom Stevenson (talk) 17:11, 26 May 2015 (EDT)


Index

Banner 8 setups			 (Still a work in progress)
T5440 Setup			 (Still a work in progress) 
M5000 Setup			 (Still a work in progress) 
Solaris 10 Setup		 (Still a work in progress) 
Fair Share Scheduler		 (Still a work in progress) 
Resource Pools			 (Still a work in progress) 
Solaris Cluster 3.2		 (Still a work in progress) 
Solaris Zones			 (Still a work in progress) 
Patching Cluster with HA-Zones	 (Still a work in progress) 

SVM and BE setup

All of our global nodes have four internal disks. This allows for the configuration of two BEs, each with mirrored root disks. Each time the systems are patched using LiveUpgrade, the two BEs are alternated. The following is the SVM configuration used on all global nodes.

/ on BE1.

d0 -m d10 d20 1
d10 1 1 c0t0d0s0
d20 1 1 c0t1d0s0

/var on BE1.

d3 -m d13 d23 1
d13 1 1 c0t0d0s3
d23 1 1 c0t1d0s3

/ on BE2.

d100 -m d110 d120 1
d110 1 1 c0t2d0s0
d120 1 1 c0t3d0s0

/var on BE2.

d103 -m d113 d123 1
d113 1 1 c0t2d0s3
d123 1 1 c0t3d0s3

Swap is shared between both BE1 and BE2.

d1 -m d11 d21 d31 d41 1
d11 1 1 c0t0d0s1
d21 1 1 c0t1d0s1
d31 1 1 c0t2d0s1
d41 1 1 c0t3d0s1

The Solaris Cluster GFS is hared between both BE1 and BE2. In order to make the "d" numbers unique between global nodes, the node ID number is used as part of the "d" number. For node 1, "d" number d1#6 is used. For node 2, "d" number d2#6 is used. For node 3, "d" number d3#6 is used. The example below reflects the "d" numbers used for the GFS on node 3.

d306 -m d316 d326 d336 d346 1
d316 1 1 c0t0d0s6
d326 1 1 c0t1d0s6
d336 1 1 c0t2d0s6
d346 1 1 c0t3d0s6

HA-Zones setup

All of our HA-Zones are built using HAStoragePLUS (HASP) SAN storage, and are configured using either UFS or ZFS file systems. All storage for all HA-Zones and are attached to all global nodes in the cluster.

The mount points used by each HASP use the same syntax for all HA-Zones, regardless of the storage type (UFS or ZFS). The syntax used is: "/zones/hosts/${ZONENAME}", where ${ZONENAME} translates to the name of the HA-Zone.

The "zonepath" setting used by each zone use the same syntax for all HA-Zones, regardless of the storage type (UFS or ZFS). When initially created the syntax used is: "/zones/hosts/${ZONENAME}/os", where ${ZONENAME} translates to the name of the HA-Zone, and "os" is a simple directory within the HASP file system "/zones/hosts/${ZONENAME}". The "zonepath" value is altered by the "lucreate" and "luactivate" processes described later.

See Sections "Downloading patches using smpatch" and "Creating the new BEs" below.

Downloading patches using smpatch

Download patches using "smpatch".

# cd /var/sadm/spool
# smpatch analyze > analyze
# smpatch download -x idlist=analyze
# mkdir unzip
# unzip -o /var/sadm/spool/'*.jar' -d /var/sadm/spool/unzip
# cd unzip

Go through all of the README files from each unzipped patch file and generate a "/var/sadm/spool/unzip/patch_order" file, which will be used with the "luupgrade" command.

Deleting HA-Zones in installed mode

The HA-Zones fall into the following two categories:

  1. Those HA-Zones that are actively running in a particular global node.
  2. Those HA-Zones that are configured and in "installed" mode, but are not running in a particular global node.

Only those HA-Zones that are actively running in a particular global node can be patched on that particular node. Because the HASP for a HA-Zone that is in "installed" mode

If the "lucreate" command detects a HA-Zone in "installed" mode, it will Delete each HA-Zone in "installed" mode, which runs in one of the other global nodes. The following displays the HA-Zones from one of the global nodes as an example:

# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   1 banpinb1         running    /zones/hosts/banpinb1/os       native   shared
   2 banpssb1         running    /zones/hosts/banpssb1/os       native   shared
   3 cognospweb2      running    /zones/hosts/cognospweb2/os    native   shared
   4 workflowp1       running    /zones/hosts/workflowp1/os     native   shared
   - banpcold1        installed  /zones/hosts/banpcold1/os      native   shared
   - banpinb2         installed  /zones/hosts/banpinb2/os       native   shared
   - banpsch1         installed  /zones/hosts/banpsch1/os       native   shared
   - banpssb2         installed  /zones/hosts/banpssb2/os       native   shared
   - cognospapp1      installed  /zones/hosts/cognospapp1/os    native   shared
   - cognospweb1      installed  /zones/hosts/cognospweb1/os    native   shared
   - edi              installed  /zones/hosts/edi/os            native   shared
   - lumpapp1         installed  /zones/hosts/lumpapp1/os       native   shared
   - wsupemgc1        installed  /zones/hosts/wsupemgc1/os      native   shared

# zonecfg -z banpcold1 delete -f
# zonecfg -z banpinb2 delete -f
# zonecfg -z banpsch1 delete -f
# zonecfg -z banpssb2 delete -f
# zonecfg -z cognospapp1 delete -f
# zonecfg -z cognospweb1 delete -f
# zonecfg -z edi delete -f
# zonecfg -z lumpapp1 delete -f
# zonecfg -z wsupemgc1 delete -f 

# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   1 banpinb1         running    /zones/hosts/banpinb1/os       native   shared
   2 banpssb1         running    /zones/hosts/banpssb1/os       native   shared
   3 cognospweb2      running    /zones/hosts/cognospweb2/os    native   shared
   4 workflowp1       running    /zones/hosts/workflowp1/os     native   shared

Repeat the above process for all global nodes in the cluster.

Keep track of which HA-Zones zonecfg data are deleted from the corresponding global nodes. This information will be needed when zonecfg data are recreated (see Sections "Updating the zonecfg files", "Recreating the deleted zonecfg configuration data", and "Converting the zonecfg data to installed mode")

Creating the new BEs

After all "installed" HA-Zones have been deleted from all global nodes in the cluster, create a BE in each global node of the cluster using the same BE name on all global nodes. We use the process by which the patches were downloaded and the date the patches were downloaded to determine the BE name. For example, if "smpatch" was used to download the patches on Aug. 1, 2010, the BE name would be "smpatch_2010_08_01".

Because the /global FS and Swap local storage are shared between both BEs, only the "/" and "/var" are included in the "lucreate" command (see below). The example below assumes the current BE is using d100 and d103 for the "/" and "/var" file systems, and that d0 and d3 are the new BE (see Section "SVM and BE setup").

# lucreate -n smpatch_2010_08_01 -m /:/dev/md/dsk/d0:ufs -m /var:/dev/md/dsk/d3:ufs

In addition to creating a new BE on d0 and d3 for the global node, lucreate also creates a new BE for each of the active HA-Zones by defining a new "zonepath" setting for each HA-Zone. The new "zonepath" setting is based on the current "zonepath" for each HA-Zone plus the "lucreate" name (see next paragraph for an example using HA-Zone "cognospweb2").

As an example, the HASP for the "cognospweb2" HA-Zone is mounted as "/zones/hosts/cognospweb2", and the initial "zonepath" is set to directory "/zones/hosts/cognospweb2/os". The "lucreate" command used above creates a new "zonepath" for the "cognospweb2" BE called "/zones/hosts/cognospweb2/os-smpatch_2010_08_01". This same process repeats for each running HA-Zone on each global node running the "lucreate" command.

Wait until the global node and all of the HA-Zones for each global node in the cluster has been created before continuing. If there are any errors during the BE creation process, do NOT continue.

Patching the new BEs

Once all of the new BEs (global nodes and HA-Zones) are created (see Section "Creating the new BEs"), use the "luupgrade" command in each global node to patch the global node and all of the HA-Zones running in that global node. The unzipped patches were stored in directory "/var/sadm/spool/unzip", and the "patch_order" file was created as "/var/sadm/spool/unzip/patch_order". (see Sections "Creating the new BE" and "Downloading patches using smpatch").

# luupgrade -n smpatch_2010_08_01 -t -s /var/sadm/spool/unzip `cat /var/sadm/spool/unzip/patch_order`

The above command will patch the BE for the global node and all of the HA-Zones in the global node.

Wait until the global node and all of the HA-Zones for each global node in the cluster has been patched before continuing. If there are any errors during the BE patching process, do NOT continue.

Activating the new BEs

If all of the patching process completed without errors (see Section "Patching the new BEs"), activate the new BEs in all of the global nodes in the cluster.

Using the example BE name from above (see Sections "Creating the new BEs" and "Patching the new BEs"), execute the following:

# luactivate -n smpatch_2010_08_01

Wait until the global node and all of the HA-Zones for each global node in the cluster has been activated before continuing. If there are any errors during the BE activation process, do NOT continue.

Booting the new BEs

If all of the activation process completed without errors (see Section "Activating the new BEs"), shutdown all of the HA-Zones in the cluster using the "scswitch -F -g ${ZONENAME}-rg" command, where ${ZONENAME}-rg is the name of the SC Resource Group for each HA-Zone in the cluster.

Once all of the HA-Zones are down, shutdown all of the global nodes.

Once all of the global nodes are down, reboot all of the global nodes. During the reboot process, all of the HA-Zones will attempt to restart.

Depending on timings of the global node reboots, some HA-Zones may attempt to start on nodes other than their primary global node.

Because the "zonecfg" definitions are only defined on the primary global nodes (see Section "Deleting HA-Zones in installed mode"), any HA-Zones that attempts to start up on a global node other than it's primary will fail. This is not a problem. By the time the initial HA-Zone failure occurs, all of the global nodes should be up, and the next time SC tries to start the HA-Zone, it should be able to start it up on the HA-Zone's primary global node.

Even if one of the global nodes is very slow to come up fully, SC will try multiple times to restart a failed HA-Zone, so even multiple HA-Zone failures are not a problem.

The only time a HA-Zone failure is a real failure (at this point in the process) is if it fails when trying to startup on it's primary global node. If this occurs, it is a real problem, and the cause of the problem must be determined.

Wait until all of the global nodes and all of the HA-Zones for each global node in the cluster has been rebooted using the activated BE before continuing. If there are any errors during the BE reboot process, do NOT continue.

Updating the zonecfg files

If all of the reboot process completed without errors (see Section "Booting the new BEs"), the updated "zonecfg" information for each running HA-Zone needs to be updated and repopulated to each global node in the cluster.

During the "reboot" process (see Section "Booting the new BEs"), the "zonepath" setting for each running HA-Zone is updated with the new "zonepath" value (see Section "Creating the new BEs").

Using HA-Zone "banpinb1" and "lucreate" name of "smpatch_2010_08_01" as an example, if the original "zonepath" for HA-Zone "banpinb1" was "/zones/hosts/banpinb1/os", after the reboot, it would be "/zones/hosts/banpinb1/os-smpatch_2010_08_01". The same conversion would occur for each running HA-Zone.

Determine the names of each running HA-Zone for every global node in the cluster. Using one node as an example:

# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   1 banpinb1         running    /zones/hosts/banpinb1/os       native   shared
   2 banpssb1         running    /zones/hosts/banpssb1/os       native   shared
   3 cognospweb2      running    /zones/hosts/cognospweb2/os    native   shared
   4 workflowp1       running    /zones/hosts/workflowp1/os     native   shared

The zone configuration file for each HA-Zone is stored in the GFS in the following location: /global/zones/config/${ZONENAME}/zonecfg, where ${ZONENAME} is the name of the HA-Zone. The zone configuration file needed to be updated based on "zonecfg" data for each running HA-Zone. Using HA-Zone "banpinb1" as an example, update it's zone configuration file based on the output from the "zonecfg" command for that HA-Zone:

# zonecfg -z banpinb1 export > /global/zones/config/banpinb1/zonecfg

Repeat this process for every running HA-Zone on each global node (these steps must be done on the global node running each active HA-Zone).

Recreating the deleted zonecfg configuration data

For each HA-Zone zonecfg configuration that was deleted (see Section "Deleting HA-Zones in installed mode"), recreate the HA-Zone zonecfg configuration by executing:

# zonecfg -z ${ZONENAME} -f /global/zones/config/${ZONENAME}/zonecfg

where ${ZONENAME} is the name of each HA-Zone that was deleted on each global node (see Section "Deleting HA-Zones in installed mode"). Using HA-Zone banpinb1 as an example:

# zonecfg -z banpinb1 -f /global/zones/config/banpinb1/zonecfg

The above step is repeated for each HA-Zone that was delete (see Section "Deleting HA-Zones in installed mode").

After all of the zonecfg data has been recreated on each global node, the output from "zoneadm list -cv" should look something like this (using one global node as an example):

# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   1 banpinb1         running    /zones/hosts/banpinb1/os       native   shared
   2 banpssb1         running    /zones/hosts/banpssb1/os       native   shared
   3 cognospweb2      running    /zones/hosts/cognospweb2/os    native   shared
   4 workflowp1       running    /zones/hosts/workflowp1/os     native   shared
   - banpcold1        configured /zones/hosts/banpcold1/os      native   shared
   - banpinb2         configured /zones/hosts/banpinb2/os       native   shared
   - banpsch1         configured /zones/hosts/banpsch1/os       native   shared
   - banpssb2         configured /zones/hosts/banpssb2/os       native   shared
   - cognospapp1      configured /zones/hosts/cognospapp1/os    native   shared
   - cognospweb1      configured /zones/hosts/cognospweb1/os    native   shared
   - edi              configured /zones/hosts/edi/os            native   shared
   - lumpapp1         configured /zones/hosts/lumpapp1/os       native   shared
   - wsupemgc1        configured /zones/hosts/wsupemgc1/os      native   shared

The zonecfg configuration data that was recreated would be indicated by the status of "configured".

Convert each configured HA-Zone to installed mode

If all of the zonecfg configurations were recreated successfully (see Section "Recreating the deleted zonecfg configuration data"), convert the HA-Zones from "configured" to "installed" status.

The HA-Zone Resource names can be found using the command:

# scstat -g | grep "Resource: zone-" | grep Online

For example, the output may look like this:

# scstat -g | grep "Resource: zone-" | grep Online
  Resource: zone-BANPINB1-rs banpapp3                 Online         Online - Service is online.
  Resource: zone-BANPSSB2-rs banpapp2                 Online         Online - Service is online.
  Resource: zone-BANPINB2-rs banpapp2                 Online         Online - Service is online.
  Resource: zone-BANPSCH1-rs banpapp1                 Online         Online - Service is online.
  Resource: zone-BANPSSB1-rs banpapp3                 Online         Online - Service is online.
  Resource: zone-COGNOSPWEB1-rs banpapp2                 Online         Online - Service is online.
  Resource: zone-COGNOSPWEB2-rs banpapp3                 Online         Online - Service is online.
  Resource: zone-EDI-rs    banpapp1                 Online         Online - Service is online.
  Resource: zone-COGNOSPAPP1-rs banpapp1                 Online         Online - Service is online.
  Resource: zone-LUMPAPP1-rs banpapp2                 Online         Online - Service is online.
  Resource: zone-WORKFLOWP1-rs banpapp3                 Online         Online - Service is online.
  Resource: zone-BANPCOLD1-rs banpapp1                 Online         Online - Service is online.
  Resource: zone-WSUPEMGC1-rs banpapp1                 Online         Online - Service is online.

The value in the second column is the name of the HA-Zone zone Resource.

The value in the third column is the name of the primary global node that HA-Zone should run on. Keep this information, it will be used in section "Reboot all of the HA-Zones on their primary global node".

For each zone Resource list in the output from the above command, disable that resource using the "scswitch -n -j ${ZONE_RS}" command, replacing ${ZONE_RS} with the HA-Zone zone Resource name. Using the EDI zone as an example:

# scswitch -n -j zone-EDI-rs

Once all of the HA-Zone zones are shutdown, fail each HA-Zone Resource Group to the first global node. Again, using the EDI zone and banpapp1 as the first global node as an example:

# scswitch -z -g EDI-rg -h banpapp1

Repeat this process for all HA-Zones.

Once all of the HA-Zones have been failed to global node 1. Convert the status of the HA-Zone from "configured" to "installed" using the following process.

Locate all of the "configured" HA-Zones using the "zoneadm list -cv" command (using one global node as an example):

# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   1 banpinb1         installed  /zones/hosts/banpinb1/os       native   shared
   2 banpssb1         installed  /zones/hosts/banpssb1/os       native   shared
   3 cognospweb2      installed  /zones/hosts/cognospweb2/os    native   shared
   4 workflowp1       installed  /zones/hosts/workflowp1/os     native   shared
   - banpcold1        configured /zones/hosts/banpcold1/os      native   shared
   - banpinb2         configured /zones/hosts/banpinb2/os       native   shared
   - banpsch1         configured /zones/hosts/banpsch1/os       native   shared
   - banpssb2         configured /zones/hosts/banpssb2/os       native   shared
   - cognospapp1      configured /zones/hosts/cognospapp1/os    native   shared
   - cognospweb1      configured /zones/hosts/cognospweb1/os    native   shared
   - edi              configured /zones/hosts/edi/os            native   shared
   - lumpapp1         configured /zones/hosts/lumpapp1/os       native   shared
   - wsupemgc1        configured /zones/hosts/wsupemgc1/os      native   shared

For each of the "configured" HA-Zones, convert the status to "installed" by executing the following:

# zoneadm -z ${ZONENAME} attach -F

where ${ZONENAME} is the name of the configured HA-Zone. After all of the HA-Zones have been reconfigured, the output from "zoneadm list -cv" will reflect the updated status (using one global node as an example):

# zoneadm list -cv                                    
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   1 banpinb1         installed  /zones/hosts/banpinb1/os       native   shared
   2 banpssb1         installed  /zones/hosts/banpssb1/os       native   shared
   3 cognospweb2      installed  /zones/hosts/cognospweb2/os    native   shared
   4 workflowp1       installed  /zones/hosts/workflowp1/os     native   shared
   - banpcold1        installed  /zones/hosts/banpcold1/os      native   shared
   - banpinb2         installed  /zones/hosts/banpinb2/os       native   shared
   - banpsch1         installed  /zones/hosts/banpsch1/os       native   shared
   - banpssb2         installed  /zones/hosts/banpssb2/os       native   shared
   - cognospapp1      installed  /zones/hosts/cognospapp1/os    native   shared
   - cognospweb1      installed  /zones/hosts/cognospweb1/os    native   shared
   - edi              installed  /zones/hosts/edi/os            native   shared
   - lumpapp1         installed  /zones/hosts/lumpapp1/os       native   shared
   - wsupemgc1        installed  /zones/hosts/wsupemgc1/os      native   shared

Fail all of the HA-Zones to global node 2, and repeat the process for each of the "configured" Ha-Zones (the list of HA-Zones will not necessarily be the same on all of the global nodes, so each global node must be check separately).

Repeat the same process for all global nodes in the cluster until all HA-Zones are defined with "installed" status.

Reboot all of the HA-Zones on their primary global node

Using the information saved from section "Convert each configured HA-Zone to installed mode.", fail each HA-Zone to its primary global node. In the HA-Zone EDI example, the "scstat -g" output from section "Convert each configured HA-Zone to installed mode" contains the following:

 Resource: zone-EDI-rs    banpapp1                 Online         Online - Service is online.

This indicates that its primary global node was "banpapp1". Fail the EDI Resource Group, EDI-rg to "banpapp1":

# scswitch -z -g EDI-rg -h banpapp1

Repeat this for every HA-Zone. Once all of the HA-Zones are rehomed to their primary global node, restart all of the zone resources. In the HA-Zone EDI example:

# scswitch -e -j zone-EDI-rs

These two steps can be replaced by the following one step per HA-Zone, as indicated in the following EDI example:

# scswitch -Z -g EDI-rg

This will move the HA-Zone to its primary global node and enable the resource zone-EDI-rs as one step. Repeat for each HA-Zone.


--Tom Stevenson (talk) 13:52, 16 April 2013 (EDT)

Help contents:

Reading: Go | Search | URL | Namespace | Page name | Section | Link | Backlinks | Piped link | Interwiki link | Redirect | Variable | Category | Special page
Tracking changes: Recent | (enhanced) | Related | Watching pages | Page history | Diff | User contributions | Edit summary | Minor edit | Patrolled edit
Logging in and preferences: Logging in | Preferences | User style
Editing: Overview | Wikitext | New page | List | Images/files | Image page | Special characters | Formula | Table | EasyTimeline | Inputbox | Template | (p. 2) | Renaming (moving) a page | Editing shortcuts | Talk page | Testing | Export | Import | rlc |