Question & Answer
Question
What is the purpose of the ReplMissedMetadataHeartbeat event, what causes it to trigger, and can we prevent the alerts from showing so often?
Cause
The ReplMissedMetadataHeartbeat event messages are sent between replication nodes at 30 second intervals, and each keeps track of when the last one was received. The event is generated for one of two events:
•A metadata heartbeat from a replication node is missed for more than 5 intervals. The event is generated for each node.
•The PTS latency for files exceeds 10 seconds.
This is mostly because of how the event system is designed where there is not a time threshold that older events can "age" out. Having no aggregation in use will result in an event for a missed HB
any time they occur. Often these messages will be seen on a random event where the timing of the heartbeat delivery was only slightly delayed passed the event threshold.
Using aggregation avoids this, however customers can get what seems like random serious events since the occurrence accrues over time and then are all reported at once. It is important to check the timestamp for these errors to verify if they have occurred during a specific window or over a long duration.
Answer
You can get more detailed missed heartbeat information by using the command:
nzreplstate -heartbeat
You can also check which events are currently enabled to see what is set for the replMissedMetadataHeartbeat with:
nzevent show
If there are a large amount of events, you can always grep for the specific one we are looking for, ie:
nzevent -syntax | grep -i ReplMissedMetadataHeartbeat
In the example below, the ReplMissedMetadataHeartbeat notification is enabled with no aggregation, meaning it will send an email for each alert. Most customers would probably want this set to a higher number as to not receive the false alerts that often occur.
-name 'replMissedMetadataHeartbeat' -on yes -eventType replMissedMetadataHeartbeat -eventArgsExpr '' -notifyType email -dst 'michael@ibm.com' -ccDst '' -msg 'replMissedMetadataHeartbeat' -bodyText '' -callHome no -eventAggrCount 0
Here is another example that you can use to enable the aggregation to 10 for the admin user on the replMissedMetadataheartbeat event:
nzevent modify -u admin -pw password -name replMissedMetadataHeartbeat -on yes -dst email@ibm.com -eventAggrCount 10
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21699318