APAR status
Closed as program error.
Error description
A secondary server needs failure recovery, and it cannot reach the primary server. when starting the secondary we get: 14:50:08 Physical Recovery Started at Page (1:8442). 14:50:08 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 14:50:08 DR: Trying to connect to primary server = ol_pr1150fc6 14:50:09 DR: Cannot connect to primary server 14:50:09 DR: Turned off on secondary server 14:50:09 Dataskip is now OFF for all dbspaces 14:50:09 Restartable Restore has been ENABLED 14:50:09 Recovery Mode the secondary hangs there. now we do an ontape -l to get it online: C:\sqldists\1150fc6>ontape -l Roll forward should start with log number 111 Restore will use log backup file c:\ltape\chokmah_9_Log0000000111. Press Return to continue ... Rollforward log file c:\ltape\chokmah_9_Log0000000111 ... buc_fe.c : Archive API processing failed at line 908 for msgtype Do you want to suspend (y), commit (n) or abort (CTRL/C) logical restore? this will lead to the following: 14:53:58 10 recovery worker threads will be started. 14:53:58 Start Logical Recovery - Start Log 111, End Log ? 14:53:58 Starting Log Position - 111 0x436110 14:54:02 rsrecvr.c 4585 set EOF 14:54:02 No logical log restore will be performed. 14:54:02 Clearing the physical and logical logs has started 14:54:04 Cleared 103 MB of the physical and logical logs in 2 seconds 14:54:05 Physical Recovery Started at Page (1:7142). 14:54:05 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 14:54:05 Logical Recovery Started. 14:54:05 10 recovery worker threads will be started. 14:54:05 Started processing open transactions on secondary during startup 14:54:06 Finished processing open transactions on secondary during startup. 14:54:06 DR: HDR secondary server operational 14:54:06 Logical Recovery has reached the transaction cleanup phase. 14:54:06 Logical Recovery Complete. 13 Committed, 1 Rolled Back, 0 Open, 0 Bad Locks 14:54:06 Begin recreating indexes deferred during recovery. 14:54:06 Bringing system to Quiescent Mode with no Logical Restore. 14:54:07 Quiescent Mode 14:54:07 Completed recreating indexes. 14:54:08 Checkpoint Completed: duration was 0 seconds. 14:54:08 Tue Feb 23 - loguniq 111, logpos 0x453018, timestamp: 0x1ae7cb2 Interval: 2964 14:54:08 Maximum server connections 1 14:54:08 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 1, Plog used ontape will bring the engine to quiescent mode if the primary comes online again, the secondary states: 15:49:32 DR: Secondary server connected 15:49:32 Updates from secondary allowed 15:49:32 Updates from secondary allowed 15:49:32 DR: Secondary server needs failure recovery 15:49:35 DR: Start failure recovery from tape ... 15:49:40 DR: Multi-User mode 15:49:41 Recovery Mode 15:49:44 Updates from secondary currently not allowed 15:49:44 Updates from secondary currently not allowed 15:49:44 Updates from secondary currently not allowed 15:49:44 Updates from secondary currently not allowed if you do an ontape -l the following will happen: 15:55:30 DR: Start failure recovery from tape ... 15:55:31 Logical Recovery Started. 15:55:31 10 recovery worker threads will be started. 15:55:31 Start Logical Recovery - Start Log 111, End Log ? 15:55:31 Starting Log Position - 111 0x453018 15:55:34 Requested shared memory segment size rounded from 532596KB to 532608KB 15:55:34 Dynamically allocated new virtual shared memory segment (size 532608KB) 15:55:34 Memory sizes:resident:47616 KB, virtual:565312 KB, no SHMTOTAL limit 15:55:34 Checkpoint Record not Found in Logical Log. 15:55:34 Assert Failed: Invalid Mutex Type 15:55:34 IBM Informix Dynamic Server Version 11.50.FC6 15:55:34 Who: Session(15, informix@HDR1, 1540, 0000000000000000) Thread(38, ontape, 0, 1) File: mt.c Line: 8617 15:55:34 stack trace for pid 1568 written to C:\sqldists\1150fc6\tmp\af.40eec66 15:55:42 See Also: C:\sqldists\1150fc6\tmp\af.40eec66, shmem.40eec66.0 15:55:43 Updates from secondary currently not allowed 15:55:43 Updates from secondary currently not allowed 15:55:43 Updates from secondary currently not allowed 15:55:43 Updates from secondary currently not allowed this is the stack of the af file: 15:55:34 Stack for thread: 38 ontape base: 0x0000000083870000 len: 69632 pc: 0x0000000140b58cd4 tos: 0x000000008387edf0 state: running vp: 1 0x0000000140B59C1A (oninit)afhandler 0x0000000140B5A4AF (oninit)afcrash_interface 0x0000000140B4A18C (oninit)mt_lock 0x000000014057EF88 (oninit)_igetstreamQ 0x000000014070F6C2 (oninit)rswrite_lgr 0x0000000140288A5F (oninit)tbj_log_write_restore 0x0000000140061228 (oninit)sqmain 0x0000000140BD0D20 (oninit)listen_verify 0x0000000140BD220E (oninit)spawn_thread 0x0000000140B8616A (oninit)startup IFX_ONTAPE_FILE_PRE and the ontape will also fail with an error message C:\Windows\system32>ontape -l Roll forward should start with log number 111 Restore will use log backup file c:\ltape\chokmah_9_Log0000000111. Press Return to continue ... Rollforward log file c:\ltape\chokmah_9_Log0000000111 ... buc_fe.c : Archive API processing failed at line 908 for msgtype Do you want to suspend (y), commit (n) or abort (CTRL/C) logical restore? Do you want to suspend (y), commit (n) or abort (CTRL/C) logical restore? y Program over. the secondary is now corrupted, it cannot even be patched, it needs to be re-initialized. ontape should check whether or not it is running on a secondary and not do the cleaning of logical logs and going to quiescent mode.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * Users of HDR environment. * **************************************************************** * PROBLEM DESCRIPTION: * * Performing log restore (ontape -l, or via onbar) on a HDR * * secondary server when the primary server is unavailable can * * cause the HDR pair to fail synchronization when the primary * * is back online. * **************************************************************** * RECOMMENDATION: * * Upgrade to 11.50.xC9 when available. * ****************************************************************
Problem conclusion
Problem fixed in 11.50.xC9.
Temporary fix
Comments
APAR Information
APAR number
IC68160
Reported component name
IBM IDS ENTRP E
Reported component ID
5724L2304
Reported release
B15
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2010-04-25
Closed date
2011-09-27
Last modified date
2011-09-27
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM IDS ENTRP E
Fixed component ID
5724L2304
Applicable component levels
RB15 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"B15","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
27 September 2011