IBM Support

JR51318: SOURCE SIDE CDC ENCOUNTERS A HEARTBEAT TIME OUT EVEN THOUGH THERE IS NO REAL NETWORK ISSUE.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When target system is loaded and as a result ts_bookmark update
    is slow, fatal communication error can happen.
    
    - Error message in
    Source side exception:
    
    4127 2014-08-25 04:45:11.448 Agent Reader{3696}
    com.datamirror.ts.util.TsThread run() Thread end normal
    4128 2014-08-25 04:48:27.167 R_MA_3 Source Data Channel{159}
    com.datamirror.ts.engine.ReplicationSession sendMessage() COMMS
    : send failed -- link tag=0x0B0AB is not
    found?java.lang.Exception COMMS : send failed -- link
    tag=0x0B0AB is not found| at
    com.datamirror.common.comms.v50.Comms.getLink(Comms.java:662)|
    at com.datamirror.common.comms.v50.Comms.send(Comms.java:825)|
    at
    com.datamirror.ts.engine.ReplicationSession.sendMessage(Replicat
    ionSession.java:2752)| at
    com.datamirror.ts.engine.ReplicationSession.flushDataMessages(Re
    plicationSession.java:2874)| at
    com.datamirror.ts.source.replication.MirrorModerator.moderateFor
    TheSource(MirrorModerator.java:751)| at
    com.datamirror.ts.source.replication.ModeratorBase$SourceDataCha
    nnelJob.execute(ModeratorBase.java:628)| at
    com.datamirror.ts.engine.component.PipelineThread.runThread(Pipe
    lineThread.java:217)| at
    com.datamirror.ts.util.TsThread.run(TsThread.java:130)
    4129 2014-08-25 04:48:27.167 R_MA_3 Source Data Channel{159}
    com.datamirror.ts.eventlog.EventLogger logActualEvent() Event
    logged: ID=1095 MSG=A fatal communication error has occurred.
    4130 2014-08-25 04:48:27.167 R_MA_1 Source Data Channel{158}
    com.datamirror.ts.engine.ReplicationSession sendMessage() COMMS
    : send failed -- link tag=0x0B0AE is not
    found?java.lang.Exception COMMS : send failed -- link
    tag=0x0B0AE is not found| at
    com.datamirror.common.comms.v50.Comms.getLink(Comms.java:662)|
    at com.datamirror.common.comms.v50.Comms.send(Comms.java:825)|
    at
    com.datamirror.ts.engine.ReplicationSession.sendMessage(Replicat
    ionSession.java:2752)| at
    com.datamirror.ts.engine.ReplicationSession.flushDataMessages(Re
    plicationSession.java:2874)| at
    com.datamirror.ts.source.replication.MirrorModerator.moderateFor
    TheSource(MirrorModerator.java:751)| at
    com.datamirror.ts.source.replication.ModeratorBase$SourceDataCha
    nnelJob.execute(ModeratorBase.java:628)| at
    com.datamirror.ts.engine.component.PipelineThread.runThread(Pipe
    lineThread.java:217)| at
    com.datamirror.ts.util.TsThread.run(TsThread.java:130)
    4131 2014-08-25 04:48:27.167 R_MA_1 Source Data Channel{158}
    com.datamirror.ts.eventlog.EventLogger logActualEvent() Event
    logged: ID=1095 MSG=A fatal communication error has occurred.
    
    
    This will eventually lead to comms timeouts like below on the
    target:
    
    172 2014-08-27 18:58:05.540 CQ90573A Target Data Channel{120}
    com.datamirror.ts.target.publication.TargetDataChannelJob
    moderateForTheTarget() TS_CQ90573A:D remote
    closed/ZL2?com.datamirror.common.comms.v50.ConnectionExcept
    ion TS_CQ90573A:D remote closed/ZL2| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.notifyApp(Mo
    nitor.java:302)| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.failConversa
    tions(Monitor.java:902)| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.removeLink(M
    onitor.java:951)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.processIOmsg
    (Channel.java:2342)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.runInput(Cha
    nnel.java:2544)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.run(Channel.
    java:2628)| at java.lang.Thread.run(Thread.java:736)
    173 2014-08-27 18:58:05.540 CQ90573A Target Data Channel{120}
    com.datamirror.ts.eventlog.EventLogger logActualEvent() Event
    logged: ID=1095 MSG=A fatal communication error has occurred.
    Error: TS_CQ90573A:D remote
    closed/ZL2?com.datamirror.common.comms.v50.ConnectionExcept
    ion TS_CQ90573A:D remote closed/ZL2| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.notifyApp(Mo
    nitor.java:302)| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.failConversa
    tions(Monitor.java:902)| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.removeLink(M
    onitor.java:951)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.processIOmsg
    (Channel.java:2342)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.runInput(Cha
    nnel.java:2544)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.run(Channel.
    java:2628)| at java.lang.Thread.run(Thread.java:736)
    175 2014-08-27 18:58:07.543 CQ90573A Target Data Channel{120}
    com.datamirror.ts.util.PipelineStopRequestState
    requestAbortStop() Abort with recoverable
    error.?java.lang.Exception Abort with recoverable error.|
    at
    com.datamirror.ts.util.PipelineStopRequestState.requestAbortStop
    (PipelineStopRequestState.java:139)| at
    com.datamirror.ts.util.PipelineStopRequestState.requestStop(Pipe
    lineStopRequestState.java:86)| at
    com.datamirror.ts.engine.ReplicationSession.requestTargetEngineS
    hutdown(ReplicationSession.java:3043)| at
    com.datamirror.ts.target.publication.TargetPublisherProxy.shutdo
    wnTargetRequest(TargetPublisherProxy.java:2012)| at
    com.datamirror.ts.target.publication.TargetDataChannelJob.modera
    teForTheTarget(TargetDataChannelJob.java:126)| at
    com.datamirror.ts.target.publication.TargetDataChannelJob.execut
    e(TargetDataChannelJob.java:67)| at
    com.datamirror.ts.engine.component.PipelineThread.runThread(Pipe
    lineThread.java:217)| at
    com.datamirror.ts.util.TsThread.run(TsThread.java:130)
    191 2014-08-27 19:06:16.288 CQ90573A Target Control
    Channel{119}
    com.datamirror.ts.engine.ReplicationSession$TargetControlChannel
    Thread runThread() DEFAULT_LINK:C output
    shutdown?com.datamirror.common.comms.v50.ConnectionExceptio
    n DEFAULT_LINK:C output shutdown| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.notifyApp(Mo
    nitor.java:302)| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.failConversa
    tions(Monitor.java:902)| at
    com.datamirror.common.comms.v50.CommsSystem.Monitor.removeLink(M
    onitor.java:951)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.runOutput(Ch
    annel.java:2595)| at
    com.datamirror.common.comms.v50.CommsSystem.Channel.run(Channel.
    java:2627)| at java.lang.Thread.run(Thread.java:736)
    

Local fix

  • Increase the value of
    mirror_check_target_position_interval_seconds longer than
    ts_bookmark update time and restart the source CDC instance.
    

Problem summary

  • IIDR fails with heartbeat timeout even when there are no network
    issues.
    
    This issue affects customers running IIDR 10.2 and 10.2.1 in all
    databases that are experiencing poor performance of the target
    database.
    

Problem conclusion

  • This issue is fixed by applying the following interim fixes
    depending on the database flavor and product version:
    - IIDR 10.2 Interim Fix 9 for Netezza; or
    - IIDR 10.2 Interim Fix 8 for DB2 LUW; or
    - IIDR 10.2 Interim Fix 22 for Oracle Redo; or
    - IIDR 10.2.1 Interim Fix 17 for DB2 LUW.
    

Temporary fix

  • Increase the value of
    mirror_check_target_position_interval_seconds and restart the
    
    source CDC instance.
    

Comments

APAR Information

  • APAR number

    JR51318

  • Reported component name

    IS DATA REPLICA

  • Reported component ID

    5725E3000

  • Reported release

    A20

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-09-21

  • Closed date

    2014-11-27

  • Last modified date

    2014-11-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IS DATA REPLICA

  • Fixed component ID

    5725E3000

Applicable component levels

  • RA20 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTRGZ","label":"InfoSphere Data Replication"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.2.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
27 November 2014