ZooKeeper can not be connected error

ZooKeeper troubleshooting 1:

After replacing nodes or adding nodes, ZooKeeper server can not be connected from cinder and gnocchi client.

Some error logs show as below:

In cinder-api.log:

CRITICAL cinder [-] Unhandled error: tooz.coordination.ToozConnectionError: Operational error: Connection time-out
ERROR cinder Traceback (most recent call last):
ERROR cinder   File "/usr/lib/python3.6/site-packages/tooz/drivers/zookeeper.py", line 150, in _start
ERROR cinder     self._coord.start(timeout=self.timeout)
ERROR cinder   File "/usr/lib/python3.6/site-packages/kazoo/client.py", line 635, in start
ERROR cinder     raise self.handler.timeout_exception("Connection time-out")
ERROR cinder kazoo.handlers.threading.KazooTimeoutError: Connection time-out

In gnocchi metricd log

WARNING  tooz.coordination: Retrying tooz.drivers.zookeeper.KazooDriver.heartbeat in 1.0 seconds as it raised Connection has been closed.
ERROR    gnocchi.cli.metricd: Unexpected error updating the task partitioner: Connection has been closed
ERROR    gnocchi.cli.metricd: Unexpected error during processing job

In zookeeper log

[myid:] - ERROR [SyncThread:1:o.a.z.s.ZooKeeperCriticalThread@49] - Severe unrecoverable error, from thread : SyncThread:1
java.lang.NullPointerException: null
        at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:67)
        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:248)
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169)
[myid:] - ERROR [LearnerHandler-/172.26.3.219:55954:o.a.z.s.q.LearnerHandler@719] - Unexpected exception in LearnerHandler:
java.io.EOFException: null
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
	at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
	at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
	at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:541)

Explanation 1

Server not coming up because of file corruption : A server might not be able to read its database and fail to come up because of some file corruption in the transaction logs of the ZooKeeper server. You will see some IOException errors in ZooKeeper logfile.

Resolution 1

In such a case,delete all the files in /var/lib/zookeeper/version-2 on three nodes respectively.

cd /var/lib/zookeeper
mv version-2 version-2.bak
mkdir version-2
chown zookeeper:zookeeper -R version-2
icic-services restart 

ZooKeeper troubleshooting 2:

In zookeeper log:

java.io.IOException: Leaders epoch, 3 is less than accepted epoch, 63
        at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:525)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:91)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1539)

Resolution 2

Delete acceptedEpoch and currentEpoch file under /var/lib/zookeeper/version-2.

-rw-r--r--. 1 zookeeper zookeeper        1 May 22 21:17 acceptedEpoch
-rw-r--r--. 1 zookeeper zookeeper        1 May 22 21:17 currentEpoch

Then icic-services restart.