APAR status
Closed as program error.
Error description
When target system is loaded and as a result ts_bookmark update is slow, fatal communication error can happen. - Error message in Source side exception: 4127 2014-08-25 04:45:11.448 Agent Reader{3696} com.datamirror.ts.util.TsThread run() Thread end normal 4128 2014-08-25 04:48:27.167 R_MA_3 Source Data Channel{159} com.datamirror.ts.engine.ReplicationSession sendMessage() COMMS : send failed -- link tag=0x0B0AB is not found?java.lang.Exception COMMS : send failed -- link tag=0x0B0AB is not found| at com.datamirror.common.comms.v50.Comms.getLink(Comms.java:662)| at com.datamirror.common.comms.v50.Comms.send(Comms.java:825)| at com.datamirror.ts.engine.ReplicationSession.sendMessage(Replicat ionSession.java:2752)| at com.datamirror.ts.engine.ReplicationSession.flushDataMessages(Re plicationSession.java:2874)| at com.datamirror.ts.source.replication.MirrorModerator.moderateFor TheSource(MirrorModerator.java:751)| at com.datamirror.ts.source.replication.ModeratorBase$SourceDataCha nnelJob.execute(ModeratorBase.java:628)| at com.datamirror.ts.engine.component.PipelineThread.runThread(Pipe lineThread.java:217)| at com.datamirror.ts.util.TsThread.run(TsThread.java:130) 4129 2014-08-25 04:48:27.167 R_MA_3 Source Data Channel{159} com.datamirror.ts.eventlog.EventLogger logActualEvent() Event logged: ID=1095 MSG=A fatal communication error has occurred. 4130 2014-08-25 04:48:27.167 R_MA_1 Source Data Channel{158} com.datamirror.ts.engine.ReplicationSession sendMessage() COMMS : send failed -- link tag=0x0B0AE is not found?java.lang.Exception COMMS : send failed -- link tag=0x0B0AE is not found| at com.datamirror.common.comms.v50.Comms.getLink(Comms.java:662)| at com.datamirror.common.comms.v50.Comms.send(Comms.java:825)| at com.datamirror.ts.engine.ReplicationSession.sendMessage(Replicat ionSession.java:2752)| at com.datamirror.ts.engine.ReplicationSession.flushDataMessages(Re plicationSession.java:2874)| at com.datamirror.ts.source.replication.MirrorModerator.moderateFor TheSource(MirrorModerator.java:751)| at com.datamirror.ts.source.replication.ModeratorBase$SourceDataCha nnelJob.execute(ModeratorBase.java:628)| at com.datamirror.ts.engine.component.PipelineThread.runThread(Pipe lineThread.java:217)| at com.datamirror.ts.util.TsThread.run(TsThread.java:130) 4131 2014-08-25 04:48:27.167 R_MA_1 Source Data Channel{158} com.datamirror.ts.eventlog.EventLogger logActualEvent() Event logged: ID=1095 MSG=A fatal communication error has occurred. This will eventually lead to comms timeouts like below on the target: 172 2014-08-27 18:58:05.540 CQ90573A Target Data Channel{120} com.datamirror.ts.target.publication.TargetDataChannelJob moderateForTheTarget() TS_CQ90573A:D remote closed/ZL2?com.datamirror.common.comms.v50.ConnectionExcept ion TS_CQ90573A:D remote closed/ZL2| at com.datamirror.common.comms.v50.CommsSystem.Monitor.notifyApp(Mo nitor.java:302)| at com.datamirror.common.comms.v50.CommsSystem.Monitor.failConversa tions(Monitor.java:902)| at com.datamirror.common.comms.v50.CommsSystem.Monitor.removeLink(M onitor.java:951)| at com.datamirror.common.comms.v50.CommsSystem.Channel.processIOmsg (Channel.java:2342)| at com.datamirror.common.comms.v50.CommsSystem.Channel.runInput(Cha nnel.java:2544)| at com.datamirror.common.comms.v50.CommsSystem.Channel.run(Channel. java:2628)| at java.lang.Thread.run(Thread.java:736) 173 2014-08-27 18:58:05.540 CQ90573A Target Data Channel{120} com.datamirror.ts.eventlog.EventLogger logActualEvent() Event logged: ID=1095 MSG=A fatal communication error has occurred. Error: TS_CQ90573A:D remote closed/ZL2?com.datamirror.common.comms.v50.ConnectionExcept ion TS_CQ90573A:D remote closed/ZL2| at com.datamirror.common.comms.v50.CommsSystem.Monitor.notifyApp(Mo nitor.java:302)| at com.datamirror.common.comms.v50.CommsSystem.Monitor.failConversa tions(Monitor.java:902)| at com.datamirror.common.comms.v50.CommsSystem.Monitor.removeLink(M onitor.java:951)| at com.datamirror.common.comms.v50.CommsSystem.Channel.processIOmsg (Channel.java:2342)| at com.datamirror.common.comms.v50.CommsSystem.Channel.runInput(Cha nnel.java:2544)| at com.datamirror.common.comms.v50.CommsSystem.Channel.run(Channel. java:2628)| at java.lang.Thread.run(Thread.java:736) 175 2014-08-27 18:58:07.543 CQ90573A Target Data Channel{120} com.datamirror.ts.util.PipelineStopRequestState requestAbortStop() Abort with recoverable error.?java.lang.Exception Abort with recoverable error.| at com.datamirror.ts.util.PipelineStopRequestState.requestAbortStop (PipelineStopRequestState.java:139)| at com.datamirror.ts.util.PipelineStopRequestState.requestStop(Pipe lineStopRequestState.java:86)| at com.datamirror.ts.engine.ReplicationSession.requestTargetEngineS hutdown(ReplicationSession.java:3043)| at com.datamirror.ts.target.publication.TargetPublisherProxy.shutdo wnTargetRequest(TargetPublisherProxy.java:2012)| at com.datamirror.ts.target.publication.TargetDataChannelJob.modera teForTheTarget(TargetDataChannelJob.java:126)| at com.datamirror.ts.target.publication.TargetDataChannelJob.execut e(TargetDataChannelJob.java:67)| at com.datamirror.ts.engine.component.PipelineThread.runThread(Pipe lineThread.java:217)| at com.datamirror.ts.util.TsThread.run(TsThread.java:130) 191 2014-08-27 19:06:16.288 CQ90573A Target Control Channel{119} com.datamirror.ts.engine.ReplicationSession$TargetControlChannel Thread runThread() DEFAULT_LINK:C output shutdown?com.datamirror.common.comms.v50.ConnectionExceptio n DEFAULT_LINK:C output shutdown| at com.datamirror.common.comms.v50.CommsSystem.Monitor.notifyApp(Mo nitor.java:302)| at com.datamirror.common.comms.v50.CommsSystem.Monitor.failConversa tions(Monitor.java:902)| at com.datamirror.common.comms.v50.CommsSystem.Monitor.removeLink(M onitor.java:951)| at com.datamirror.common.comms.v50.CommsSystem.Channel.runOutput(Ch annel.java:2595)| at com.datamirror.common.comms.v50.CommsSystem.Channel.run(Channel. java:2627)| at java.lang.Thread.run(Thread.java:736)
Local fix
Increase the value of mirror_check_target_position_interval_seconds longer than ts_bookmark update time and restart the source CDC instance.
Problem summary
IIDR fails with heartbeat timeout even when there are no network issues. This issue affects customers running IIDR 10.2 and 10.2.1 in all databases that are experiencing poor performance of the target database.
Problem conclusion
This issue is fixed by applying the following interim fixes depending on the database flavor and product version: - IIDR 10.2 Interim Fix 9 for Netezza; or - IIDR 10.2 Interim Fix 8 for DB2 LUW; or - IIDR 10.2 Interim Fix 22 for Oracle Redo; or - IIDR 10.2.1 Interim Fix 17 for DB2 LUW.
Temporary fix
Increase the value of mirror_check_target_position_interval_seconds and restart the source CDC instance.
Comments
APAR Information
APAR number
JR51318
Reported component name
IS DATA REPLICA
Reported component ID
5725E3000
Reported release
A20
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2014-09-21
Closed date
2014-11-27
Last modified date
2014-11-27
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IS DATA REPLICA
Fixed component ID
5725E3000
Applicable component levels
RA20 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTRGZ","label":"InfoSphere Data Replication"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.2.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
27 November 2014