GemStone/S 64 Bit Supplemental Documentation

Upgrading via a Hot Standby


GemStone’s hot standby failover mechanism can be used to perform a version upgrade as part of a failover with role reversal. This upgrade process reduces the downtime to perform both an upgrade and a switch that makes the former slave into the master and the former master into the slave.

In this process, you first upgrade the slave’s binaries; then, fail over from the master. The former slave, now a master, is upgraded. Then the former master is reconnected as the new slave and is upgraded in the background via the transaction log replay.

This process has been verified for upgrade from version 3.6.6 to v3.7, and is supported for upgrade from earlier 3.6.x versions to 3.6.6 or later. However, this process is complex and unknown issues may occur in some version upgrade paths; you should do thorough testing before using this on critical systems.

This upgrade process has not been verified in a multiple slave system.

Upgrade via hot standby

To upgrade via hot standby:

Prepare the environment

1. Version 3.7 contains the symbol #OptimizedSelectors, which does not exist by default in a v3.6.6 image. Since symbols cannot be created while in restore mode, you should create this symbol in the 3.6.6 master environment, before starting the upgrade.

2. Login to the master system as any user, and execute code that creates this symbol. For example:

topaz 1> exec #OptimizedSelectors printString %

3. Stop the slave system’s logreceiver and the slave stone. Restart both using the upgrade target version binaries, and restart the continuous restore. Do not perform upgradeImage yet.

The slave will continue to restore transaction from the master, although logins will report version mismatch errors; login as SystemUser is allowed. These errors are temporary until the upgrade process is complete.

You should not attempt to execute other code on the slave; the version mismatch means that methods (other than those required for running as a slave) may fail unexpectedly. This include user errors in the hotstandby upgrade process; if, for example, you enter an incorrect path for continuous restore, the error message cannot be handled or provide details.

Failover master and perform upgrade

During this period, the system will be unavailable for application commits

4. On the master, which is still on the origin version, perform failOverToSlave. This stops further commits, does a checkpoint, and puts the now former master into restore mode. Commits are now disallowed on the master (now the former master). You may shut down the master, or leave it running to be available for read-only queries.

5. On the slave, wait for the failover record to be replayed, stop the logreceiver, stop continuous restore, and perform a commitRestore.

You can determine this using SystemRepository restoreStatusInfo at: 13, which returns a non-zero timestamp when the failover transaction is replayed. On the slave system, execute:

[ 0 == (SystemRepository restoreStatusInfo at: 13) ]
    whileTrue: [ System _sleepMs: 500 ].

Once this is non-zero, stop the log receiver. Now you may proceed with executing SystemRepository stopContinuousRestore, followed by SystemRepository commitRestore.

When replay completes and the commitRestore has succeeded, the session will be terminated with a RestoreLogSucess error (4048). At this point, the former slave is ready to be the new master.

6. On the former slave/ future master, perform upgradeImage and any other required upgrade steps.

Once upgrade has completed, the new master is now running on the target version and available for logins. Application clients should be updated to use the target version’s libraries and NetLDI to log in.

Reactivate the slave

7. Start the logsender on the new master.

8. If you have not already done so, shut down the former master now. Restart the former master, now slave system’s stone and logreceiver, using the upgrade target version binaries, and start continuous restore using continuousRestoreFromArchiveLogs:. Do not perform upgradeImage.

As the transaction records come from the new master (which is was upgraded in step 5) and are replayed on the new slave (which was on the origin version), the new slave’s image will be upgraded to the upgrade target version.

There will be a lag after the transactions are restored and before next checkpoint completes, which is when the version is updated. Until this time, logins to the new slave system will continue to see a version mismatch error.