APAR status
Closed as program error.
Error description
The fix for IC79798 on updatable hdr secondary tied the smx timeouts and HDR ping timeouts together internally. So once a hdr pair is up and operational with the fix for IC79798 the servers hdr ping timeouts and smx thread timeouts should be in sync. However, if a temporary network glitch happens during the reconnect phase of hdr (when the primary is shipping log files to the secondary to get it caught up to the current position) the primary server is not sending pings, so no ping timeouts can happen at this point. Since the server will not ping timeout, if the network problem is temporary, hdr will stay connected and the log shipping would finish and the servers would get caught up and put in to an operational state (where then hdr pinging would start). But if the temporary problem is long enough that the servers would have ping timed out (if it was pinging) the SMX timeout can still happen. If it does, the updatable secondary will then be stuck in it's updates blocked state. This is what you would see in the MSGPATH files on the primary and secondary servers: primary: 08:04:48 DR: Primary server connected 08:04:48 DR: Using default behavior of failure-recovering Secondary server 08:04:50 DR: Sending log 4, size 2500 pages, 100.00 percent used 08:04:51 DR: Sending log 5, size 2500 pages, 100.00 percent used 08:04:53 DR: Sending log 6, size 2500 pages, 100.00 percent used 08:04:55 DR: Sending log 7, size 2500 pages, 100.00 percent used 08:04:57 DR: Sending log 8, size 2500 pages, 100.00 percent used 08:07:00 SMX thread is exiting 08:07:00 SMX thread is exiting 08:07:01 DR: Sending log 9, size 2500 pages, 100.00 percent used 08:07:02 DR: Sending log 10, size 2500 pages, 100.00 percent used 08:07:05 Logical Log 11 Complete, timestamp: 0x7f90f. 08:07:05 DR: Sending log 11, size 2500 pages, 30.40 percent used 08:07:05 DR: Sending log 12 (current), size 2500 pages, 0.16 percent used 08:07:07 DR: Sending Logical Logs Completed 08:07:08 DR: Primary server operational So you can see the SMX thread is exiting message at 8:07:00...the network issue happened after 8:04:57 and then resolved itself at 8:07:01 (when the next sending log message appears) Secondary: 08:04:47 DR: Secondary server connected ... 08:04:49 DR: Failure recovery from disk in progress ... 08:04:49 Logical Recovery Started. 08:04:49 17 recovery worker threads will be started. 08:04:49 Start Logical Recovery - Start Log 4, End Log ? 08:04:49 Starting Log Position - 4 0x1e018 08:04:50 Started processing open transactions on secondary during startup 08:04:50 Finished processing open transactions on secondary during startup. 08:04:50 DR: HDR secondary server operational 08:04:50 Memory sizes:resident:217760 KB, virtual:57232 KB, no SHMTOTAL limit 08:04:52 Logical Log 4 Complete, timestamp: 0x35e48. 08:04:54 Logical Log 5 Complete, timestamp: 0x42e3e. 08:04:56 Logical Log 6 Complete, timestamp: 0x4e8b3. 08:04:58 Logical Log 7 Complete, timestamp: 0x599f9. 08:06:00 SMX thread is exiting because the timeout period of 10 seconds has elapsed. Use the IFX_SMX_TIMEOUT environment variable to set the timeout period. 08:06:02 SMX thread is exiting because the timeout period of 10 seconds has elapsed. Use the IFX_SMX_TIMEOUT environment variable to set the timeout period. 08:07:00 Updates from secondary currently not allowed 08:07:00 Updates from secondary currently not allowed 08:07:01 Logical Log 8 Complete, timestamp: 0x64b5a. 08:07:02 Logical Log 9 Complete, timestamp: 0x7181c. 08:07:05 Logical Log 10 Complete, timestamp: 0x7e02e. 08:07:06 Logical Log 11 Complete, timestamp: 0x7f940. 08:07:08 B-tree scanners disabled. 08:07:10 Checkpoint Completed: duration was 0 seconds. DRTIMEOUT was set to 10, so if primary server was pinging, it would have timed out after 40 seconds of interuption but it had gone on for ~ 2 minutes (or 120 seconds) and still hadn't done it's hdr ping time out, due to the fact that the primary does not send pings until after it logs its "DR: primary server operational" message to MSGPATH.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * Users of a HDR with UPDATABLE_SECONDARY > 0 in certain cases * * when temporary network interruptions occur. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Update to IDS-11.70.xC8 * ****************************************************************
Problem conclusion
Problem Fixed In IDS-11.70.xC8
Temporary fix
Comments
APAR Information
APAR number
IC95562
Reported component name
INFORMIX SERVER
Reported component ID
5725A3900
Reported release
B70
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2013-08-28
Closed date
2014-02-26
Last modified date
2024-09-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
INFORMIX SERVER
Fixed component ID
5725A3900
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"B70","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
24 September 2024