Operation of alternate group support for connections to Db2 on Linux, UNIX, and Windows systems

Alternate group support is a high-availability feature that allows the IBM® Data Server Driver for JDBC and SQLJ to execute an application workload on one group at a time. A group can be a primary group or one of several alternate groups.

Each group is a Db2 on Linux, UNIX, and Windows systems instance, which can be single-member instance, or a multiple-member instance, such as a Db2 pureScale® instance.

Before you enable seamless failover to an alternate group, you need to enable automatic client reroute to an alternate group by setting Connection or DataSource property enableSysplexWLB to true. Then you indicate that seamless failover to an alternate group will be used by setting Connection or DataSource property enableAlternateGroupSeamlessACR to true.

You enable alternate group support by providing the addresses of the alternate groups in the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName Connection or DataSource properties. You provide the address of the primary group in the serverName, portNumber, and databaseName Connection or DataSource properties.

In an HADR environment, you enable alternate group support by providing the addresses of the standby clusters in the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName Connection or DataSource properties. The address for the primary connection is the address of the primary cluster.

To avoid unpredictable client reroute behavior, follow these practices:

  • Do not specify the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName properties, and also execute the Db2 on Linux, UNIX, and Windows systems command UPDATE ALTERNATE SERVER FOR DATABASE on the data server.
  • When you set up a Db2 pureScale instance that consists of two HADR clusters, execute the command UPDATE ALTERNATE SERVER FOR DATABASE command, but do not specify the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName properties.
  • When you set up a Db2 pureScale instance that consists of more than two HADR clusters, specify the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName properties, but do not execute the command UPDATE ALTERNATE SERVER FOR DATABASE command.

Alternate group support allows failover from the primary group to any alternate group. After a connection on a DataSource instance fails over to an alternate group, subsequent connections on the DataSource instance connect directly to that alternate group. If an alternate group becomes unavailable during a connection, the connection can fail over to any other alternate group. A group is unavailable if any of the following conditions are true:

  • The IBM Data Server Driver for JDBC and SQLJ receives a communication failure from all members of the group.
  • The members of the group return SQL errors to the IBM Data Server Driver for JDBC and SQLJ to indicate that the driver should drop the connection to that group. For example, if the role of a primary HADR database changes to a standby role, the database server might return an SQL error to the driver that indicates that the request cannot be issued on an HADR standby database. The driver interprets that SQL error as a group failure.

Alternate group support operates in the following way:

  • For the first connection, the process is as follows:
    1. After the first connection to the primary group fails, the driver attempts to connect the application to the alternate group that is specified by the first set of values in the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName properties.
    2. The driver attempts to connect to the alternate group. The number of attempts and the amount of time between retries depends on whether a cached server list exists on the client, and whether maxRetriesForClientReroute and retryIntervalForClientReroute are set:
      • If a cached server list does not exist, and maxRetriesForClientReroute and retryIntervalForClientReroute are not set, the driver makes at most five attempts to connect to the alternate group, with no time between retries.
      • If a cached server list exists, and maxRetriesForClientReroute and retryIntervalForClientReroute are not set, the driver retries the connection for up to two minutes.
      • If maxRetriesForClientReroute and retryIntervalForClientReroute are set, the driver retries the connection the number of times that is specified by maxRetriesForClientReroute, with an interval between retries that is specified by retryIntervalForClientReroute.
    3. If all attempts to connect to the first alternate group fail, the driver attempts to connect the application to the alternate group that is specified by the next set of values in the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName properties. The driver uses the same rules for the number of retries and interval between retries as it uses for the connection to the first alternate group.

      The driver continues this process until all alternate groups have been tried, or a connection has been established.

    4. If a connection is not established after all alternate group members have been tried, the driver returns SQL error -4499 to the application.
  • After a connection has been established, the process is as follows:
    1. If an error occurs for which automatic client reroute can be performed, the IBM Data Server Driver for JDBC and SQLJ attempts to reconnect to the same group. The amount of time that the driver spends on retries or the number of retries depends on the maxRetriesForClientReroute property setting. However, if the data server returns an SQL error that the driver interprets as a group failure, such as SQL error -1776, the driver does not retry the connection to the same group. SQL error -1776 indicates that the driver is attempting to connect to a standby instance.
    2. If reconnection to the same group is unsuccessful, the driver attempts to connect to each alternate group that is specified by the alternateGroupServerName, alternateGroupPortNumber, and alternateGroupDatabaseName properties, starting with the next alternate group in the list after the one that failed. The amount of time that the driver spends on retries or the number of retries depends on the maxRetriesForClientReroute property setting. If the last group in the alternate server list has been tried, the driver attempts to connect to the first server in the alternate group list, and continues through the list again.
    3. If a connection is not established, the driver returns SQL error -4499 to the application.

    maxRetriesForClientReroute controls the amount of time that the driver spends on retries and the number of retries in the following way:

    • If maxRetriesForClientReroute is not set, the driver tries to connect to each group in the list until one of the following events occurs:
      • The connection succeeds.
      • The connection has been retried for two minutes.
      • The data server returns SQL error -1776.

      If the connection is unsuccessful after two minutes, or if the data server returns SQL error -1776, the driver tries the connection to the next alternate server. This process continues until (2*number-of-groups) minutes have elapsed.

    • If maxRetriesForClientReroute is set, the driver tries to connect to each group in the list until one of the following events occurs:
      • The connection succeeds.
      • The driver has attempted to connect to all members of the group the number of times that is specified by maxRetriesForClientReroute.
      • The data server returns SQL error -1776.

      If the connection is unsuccessful after the number of retries exceeds the value that is specified by maxRetriesForClientReroute, or if the data server returns SQL error -1776, the driver tries the connection to the next alternate server. This process continues until (maxRetriesForClientReroute*number-of-groups) attempts have occurred.

Examples

Suppose that three groups are defined as shown in the following table:

Group Server type Members in group
A Primary A1, A2
B Alternate B1
C Alternate C1, C2, C3®

The address of A1 is the primary group address. The addresses of B1 and C1 are the alternate group addresses. Workload balancing and automatic client reroute are enabled.

The following example shows how properties are set on a DataSource object to enable alternate group support and define the alternate group list:

com.ibm.db2.jcc.DB2SimpleDataSource ds =       // Create DB2SimpleDataSource object
  new com.ibm.db2.jcc.DB2SimpleDataSource();
…                                              // Set other properties
ds.setDatabaseName("mydb2a");                  // Set primary group database
                                               // for group A
ds.setServerName("myservera.ibm.com");         // Set primary server name
                                               // for group A
ds.setPortNumber(5912);                        // Set primary port number
                                               // for group A
ds.setEnableSysplexWLB(true);                  // Enable automatic client reroute to an
                                               // alternate group by setting property
                                               // enableSysplexWLB to true
ds.setAlternateGroupDatabaseName("mydb2b,mydb2c");
                                               // Set alternate group databases
                                               // for groups B and C
ds.setAlternateGroupServerName("myserverb.ibm.com,myserverc.ibm.com");
                                               // Set alternate group server names
                                               // for groups B and C
ds.setAlternateGroupPortNumber(5912,5912);
                                               // Set alternate group port numbers
                                               // for groups B and C

Example: Alternate group failover when maxRetriesForClientReroute is not set: The following steps demonstrate an alternate group scenario when maxRetriesForClientReroute is not set:

  1. The driver connects to group A.
  2. While A1 is executing an SQL statement, A1 returns a communication error to the driver.
  3. The driver unsuccessfully retries the connection to the members in group A for two minutes.
  4. The driver tries the connection to group B (server B1), which is the first server in the alternate group list. The driver tries to connect for two minutes. The connection is unsuccessful.
  5. The driver tries the connection to group C, which is the next server in the alternate group list.
  6. The connection to group C succeeds.
  7. SQL work is performed on the connection to group C.
  8. While C2 is executing an SQL statement, C2 returns SQL error -1776 to the driver, indicating that it has switched to a standby role.
  9. The driver retries the connection to the members in group A, which has taken over the primary role. The connection to group A succeeds.
  10. While A1 is executing an SQL statement, A1 returns a communication error to the driver.
  11. The driver unsuccessfully retries the connection to the members in group A for two minutes.
  12. The driver unsuccessfully tries the connection to group B (server B1) for two minutes.
  13. The driver unsuccessfully retries the connection to the members in group C for two minutes.
  14. The driver has retried for six minutes (three groups * two minutes per group) so it returns an SQLException with SQL error -4499 to the application.

Example: Alternate group failover when maxRetriesForClientReroute is set: The following steps demonstrate an alternate group scenario when maxRetriesForClientReroute is set to 2:

  1. The driver connects to group A.
  2. While A1 is executing an SQL statement, A1 returns a communication error to the driver.
  3. The driver unsuccessfully retries the connection to the members in group A twice.
  4. The driver tries the connection to group B (server B1), which is the first server in the alternate group list. The driver tries to connect twice. The connection is unsuccessful.
  5. The driver tries the connection to group C (server C1), which is the next server in the alternate group list.
  6. The connection to group C succeeds.
  7. SQL work is performed on the connection to group C.
  8. While C2 is executing an SQL statement, C2 returns SQL error -1776 to the driver, indicating that it has switched to a standby role.
  9. The driver retries the connection to the members in group A, which has taken over the primary role. The connection succeeds.
  10. While A1 is executing an SQL statement, A1 returns a communication error to the driver.
  11. The driver unsuccessfully retries the connection to the members in group A twice.
  12. The driver unsuccessfully tries the connection to group B (server B1) twice.
  13. The driver unsuccessfully retries the connection to the members in group C twice.
  14. The driver has retried six times (three groups * two retries per group) so it returns an SQLException with SQL error -4499 to the application.