IBM Support

Procedure to maintain Load Balancer high availabiliity when applying maintenance

Troubleshooting


Problem

The Load Balancer high availability feature pairs two Load Balancers to minimize connection failures. Special consideration is necessary to maintain maximum site availability when software maintenance is applied in high availability environments.

Symptom

Load Balancer software maintenance installation procedure in a high availability environment

Cause

The high availability feature monitors the health of the partner Load Balancer by using periodic heartbeats. The heartbeat message contains the sending Load Balancer's product version. If the versioning does not match the receiving Load Balancer's version, the heartbeat is not recognized. When the versions do not match, both Load Balancers assume an active forwarding role. Both Load Balancers advertise cluster addresses, which cause duplication address detection. This causes site traffic failures and the Load Balancer's network interfaces can be disabled.
Active client connections are broken during a high availability takeover unless the partner Load Balancer received connection-specific data before the takeover. The connection-specific data is provided when high availability replication is enabled. Replication settings control the connection details that are transferred from the active Load Balancer to the backup Load Balancer connection. If a takeover occurs and the partner Load Balancer has no details about the connection, the newly active Load Balancer discards the packets. The newly active Load Balancer forwards existing connections that it has knowledge of and new connections. If replication is enabled, takeovers must be controlled during Load Balancer initialization.

 

Environment

WebSphere Load Balancer for IPv4 and IPv6 with high availability (HA) configured

Diagnosing The Problem

Versioning:
The Load Balancer's product versioning is based on the IBM version, release, modification, and fix level structure in a V.R.M.F format. If the installed Load Balancer and the maintenance differ only in the fix level number, there are no special considerations for upgrading due to versioning.  Starting with Load Balancer product version 8.5.5.16 and 9.0.5.1, high availability does not compare the modification level.
Examples:
Primary Load Balancer Backup Load Balancer High availability
8.5.0.2 8.5.5.15 Neither Load Balancer recognizes the partner's  heartbeat packets. Both Load Balancers run in active forwarding mode. Packet forwarding failures occur.
8.5.5.16 8.5.0.0 The primary Load Balancer recognizes the backup Load Balancer's heartbeats but the backup Load Balancer does not recognize the primary Load Balancer heartbeats. The backup Load Balancer operates in active forwarding mode and the primary load balancer operates in backup forwarding mode.
9.0.0.0 9.0.5.1 The primary Load Balancer does not recognize the backup Load Balancer's heartbeats but the backup Load Balancer recognizes the primary Load Balancer heartbeats. The primary Load Balancer operates in active forwarding mode and the backup load balancer operates in backup forwarding mode.
9.0.5.8 9.0.5.10 Both Load Balancers recognize the partner Load Balancer's heartbeat packets
High availability replication:
High availability replication is disabled by default. Replication can be set to send connection records, affinity records, or both affinity and connection records.
If the server selection algorithm is defined as affinity, no connection records are created. If the affinity record was replicated before the takeover and the affinity record is not older than the sticky time, existing connections are maintained after the takeover. The replication strategy must be defined as affinity or both (connection and affinity) for the active Load Balancer to replicate affinity information. The active Load Balancer sends affinity information to the standby Load Balancer when:
  • A client starts a new connection
  • A packet is received on an existing connection and the affinity replication record is older than half the sticky time.
When the active Load Balancer transitions into standby mode and affinity replication is enabled, all affinity records for the port are replicated.
If the server selection algorithm is defined as connection, no affinity records are created. If the connection record was replicated before the takeover and the connection record is not older than the port stale timeout value, existing connections are maintained after the takeover. The port replication strategy must be defined as connection or both (affinity and connection) for the active Load Balancer to replicate connection information. The active Load Balancer sends connection records when:
  • A connection is first established
  • A connection ends (FIN or RST packet received)
  • A packet is received on an existing connection and the connection replication record is older than half the stale timeout value
  • The server is detected offline by the advisor and port reset is enabled
When the active Load Balancer transitions into standby mode and connection replication is enabled, all connection records for the port are replicated.
If the server selection algorithm is defined as connection plus affinity and the replication strategy is defined to replicate both affinity and connection records, existing connections are maintained if either record replicated and are not expired.
Connection information is sent to the partner Load Balancer in the heartbeat packets. Each heartbeat can carry 15 connection records.

Resolving The Problem

Applying service with incompatible versions:
If the existing version and the maintenance version are not compatible, it is not possible to maintain existing connections during the software update.  Follow the upgrade procedures and update the Load Balancer running as standby first. After the software is updated, do not start the Load Balancer until the active Load Balancer is stopped. Stop all Load Balancer processes on the active Load Balancer. Start the previously updated Load Balancer and apply the software update to the stopped Load Balancer by using the upgrade procedure.
Applying service with compatible versions:
If replication is not enabled, follow the upgrade procedures on the standby Load Balancer first. After the upgraded Load Balancer is restarted and the configuration is loaded, follow the upgrade procedures on the active Load Balancer.
Apply service with high availability connection replication:
When the site is experiencing heavy traffic, do not upgrade either Load Balancer. Select a time period with minimal site usage for the upgrade. If the high availability recovery method is defined as automatic, delete the definition and readd with a recovery method of manual. The manual recovery method prevents an unnecessary takeover during the upgrade process. Modify the high availability setting on the standby Load Balancer first and then the active Load Balancer. Follow the upgrade procedure on the standby Load Balancer first.  Note the number of active connections on each port with replication enabled on the active Load Balancer at the time that the upgraded Load Balancer finishes loading its configuration. Determine the ideal wait time before upgrading the active Load Balancer. For each port with replication enabled and active connections greater than zero, note the sticky time value of the port if affinity records are replicated and the stale timeout setting on the port if connection records are replicated. The ideal wait time is half the largest value noted. The wait time can be decrease by changing the largest value on the active Load Balancer only.  After the wait time occurs, follow the upgrade procedures on the active Load Balancer. If values were modified, restore high availability settings, port sticky time, and port stale timeout values after the upgraded Load Balancer finishes loading the configuration.

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m50000000CdJZAA0","label":"IBM Edge Load Balancer-\u003EHA (High Availability)"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"8.5.0;8.5.5;9.0.0;9.0.5"}]

Document Information

Modified date:
03 January 2022

UID

swg21656604