Troubleshooting
Problem
Hearbeat can not start or be restarted if one of the appliance lower rpcs is not reachable I
Symptom
You try to start hearbeat but it fails with long list of error messages
If these lines are visible it means there is an issue checking the snmp response from the lower rpcs that control the stonith function in the cluster.
heartbeat[22784]: 2015/07/29_19:51:24 info: glib: apcmastersnmp_set_config: Starting apcmastersnmp V1.1
heartbeat[22784]: 2015/07/29_19:51:24 info: glib: apcmastersnmp_set_config: Hostnames are 10.0.128.32 and 10.0.128.132.
heartbeat[22784]: 2015/07/29_19:51:30 ERROR: glib: APC_read: error sending/receiving pdu (cliberr: 0 / snmperr: -24 / error: Timeout).
heartbeat[22784]: 2015/07/29_19:51:30 ERROR: glib: NZ_set_config: cannot read number of outlets on 1st rpc.
heartbeat[22784]: 2015/07/29_19:51:30 ERROR: Unknown Stonith config error parsing [ 10.0.128.32 161 private] [2]
heartbeat[22784]: 2015/07/29_19:51:30 debug: glib: PILS: Looking for HBcomm/use_logd => [/usr/lib64/heartbeat/plugins/HBcomm/use_logd.so]
heartbeat[22784]: 2015/07/29_19:51:30 debug: glib: Plugin file /usr/lib64/heartbeat/plugins/HBcomm/use_logd.so does not exist
heartbeat[22784]: 2015/07/29_19:51:30 debug: glib: PILS: Looking for HBcomm/use_logd => [/usr/lib64/pils/plugins/HBcomm/use_logd.so]
heartbeat[22784]: 2015/07/29_19:51:30 debug: glib: Plugin file /usr/lib64/pils/plugins/HBcomm/use_logd.so does not exist
heartbeat[22784]: 2015/07/29_19:51:30 debug: glib: Plugin file /usr/lib64/pils/plugins/HBcomm/use_logd.so does not exist
heartbeat[22784]: 2015/07/29_19:51:30 info: Enabling logging daemon
Cause
The rpc1ll is unreachable so the hearbeat start process can not read the snmp response.
[root@NZ35172-H1 ~]# ping rpc1lr
PING rpc1lr (10.0.128.32) 56(84) bytes of data.
64 bytes from rpc1lr (10.0.128.32): icmp_seq=1 ttl=255 time=2.54 ms
64 bytes from rpc1lr (10.0.128.32): icmp_seq=2 ttl=255 time=3.32 ms
64 bytes from rpc1lr (10.0.128.32): icmp_seq=3 ttl=255 time=3.28 ms
64 bytes from rpc1lr (10.0.128.32): icmp_seq=4 ttl=255 time=3.30 ms
64 bytes from rpc1lr (10.0.128.32): icmp_seq=5 ttl=255 time=3.29 ms
^C
--- rpc1lr ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4454ms
rtt min/avg/max/mdev = 2.540/3.149/3.322/0.311 ms
[root@NZ35172-H1 ~]# ping rpc1ll
PING rpc1ll (10.0.128.132) 56(84) bytes of data.
^C
--- rpc1ll ping statistics ---
12 packets transmitted, 0 received, 100% packet loss, time 11867ms
Diagnosing The Problem
In case of single rack:
Ping both rpc1lr and rpc1ll
In case of multirack system
Ping rack 1 and 2 lower rpcs ( rpc1lr, rpc1ll, rpc2lr, rpc2ll)
Also run to diagnosis possible management network issues
/nz/kit/bin/adm/tools/nznetw
Resolving The Problem
In this case the nznetw showed that the port to rpc1ll was down
******************************
Querying management switch(es)
******************************
Link Status for Management Switch 1
Port 1 : To rack 1 lower left RPC down [FAIL]
Port 2 : To rack 1 upper left RPC up [PASS]
Port 3 : To HA1 LOM3 (eth0) up [PASS]
Just try to enable it using the steps in
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21963833