Recovering a cluster after critical data is damaged

If one of the database servers in a high-availability cluster experiences a failure that damages the root dbspace, the dbspace that contains logical-log files, or the dbspace that contains the physical log, you must treat the failed database server as if it has no data on the disks as is being started for the first time. Use the functioning database server with the intact disks as the database server with the data.

Primary server failure

For the following steps, assume that the configuration consists of a primary server named srv_A and an HDR secondary server named srv_B. The steps for restarting an RS cluster are the similar.

To restart HDR after a critical media failure:

  1. The DRAUTO configuration parameter on srv_B affects what you do next
    • If it is set to 0, then you must convert the server to the primary server by running the onmode -d make primary command.
    • If it is set to 1, then convert the server to the primary server by running the onmode -d make primary command.
    • If it is set to 2, the secondary database server becomes a primary database server as soon as the connection ends when the old primary server fails.
  2. Restore srv_A (the primary database server) from the last dbspace backup.
  3. Use the onmode -d command to set srv_A to an HDR secondary database server and to start HDR.

    The onmode -d command starts a logical recovery from the logical-log files on srv_B. If logical recovery cannot complete because you backed up and freed logical-log files on srv_B, HDR does not start until you perform the next step.

  4. Apply the logical-log files from srv_B (the new primary database server), which were backed up to tape. The HDR pair is now operational; however the roles of srv_A and srv_B are swapped. To swap srv_A and srv_B back to their original roles, follow the instructions: Recovering an HDR cluster after the secondary server became the primary server.
Table 1. Steps for restarting HDR after a critical media failure on the primary database server
Step On the primary database server (svr_A) On the secondary database server (svr_B)
1.   onmode command

onmode -d make primary srv_A

2.

ontape command

ontape -p

ON-Bar command

onbar -r -p

 
3. onmode command

onmode -d secondary srv_B

 
4. ontape command

ontape -l

ON-Bar command

onbar -r -l

 

Secondary server failure

If the secondary database server suffers a critical media failure, recover the cluster by following the steps for starting a cluster for the first time.

Primary and secondary server failure

In the unfortunate event that both of the computers that are running database servers in a replication pair experience a failure that damages the root dbspace, the dbspaces that contain logical-log files or the physical log, you must restart the cluster.

To restart a high-availability cluster after a critical media failure on both database servers:
  1. Restore the primary database server from the storage space and logical-log backup.
  2. After you restore the primary database server, treat the other failed database server as if it had no data on the disks and you were starting the high-availability cluster for the first time.

Copyright© 2019 HCL Technologies Limited