Home analytics Apache Zookeeper Apache ZooKeeper 
Provides a centralized infrastructure and services that enable synchronization across an Apache Hadoop cluster
Isometric illustration and abstract art of a person working at a server and searching data, sorting files and generating analysis
What is Apache Zookeeper?

ZooKeeper is an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems.  The goal is to make these systems easier to manage with improved, more reliable propagation of changes.

How does ZooKeeper work?

If you had a Hadoop cluster spanning 500 or more commodity servers, you would need centralized management of the entire cluster in terms of name, group and synchronization services, configuration management, and more. Other open source projects using Hadoop clusters require cross-cluster services. Embedding ZooKeeper means you don’t have to build synchronization services from scratch. Interaction with ZooKeeper occurs by way of Java™ or C interface time.

For applications, ZooKeeper provides an infrastructure for cross-node synchronization by maintaining status type information in memory on ZooKeeper servers. A ZooKeeper server keeps a copy of the state of the entire system and persists this information in local log files. Large Hadoop clusters are supported by multiple ZooKeeper servers, with a master server synchronizing the top-level servers.

Within ZooKeeper, an application can create what is called a znode, which is a file that persists in memory on the ZooKeeper servers. The znode can be updated by any node in the cluster, and any node in the cluster can register to be notified of changes to that znode.

Put simply, applications can synchronize their tasks across the distributed cluster by updating their status in a ZooKeeper znode. The znode then informs the rest of the cluster of a specific node’s status change. This cluster-wide status centralization service is critical for management and serialization tasks across a large distributed set of servers.

IBM and Cloudera have partnered to offer an industry-leading, enterprise-grade Hadoop distribution, including an integrated ecosystem of products and services to support faster analytics at scale.
ZooKeeper is an out-of-the-box reliable, scalable and high-performance coordination service for distributed systems, made with developers in mind. Reduces implementation effort

Zookeeper's simple architecture makes it easier for you to implement typical coordination tasks like electing a master server, managing group membership and managing metadata in distributed environments.

Avoids development hassles

Use ZooKeeper for maintaining centralized configuration information, naming, synchronizing and managing group services in a simple interface without writing them from scratch.

Connects with ease, speed and reliability

Zookeeper stores and mediates updates to important configuration information for distributed applications in a reliable, fast and ordered manner.

Because of its versatility in distributed systems, ZooKeeper has a diverse set of practical use cases. Here are a few typical applications that rely on ZooKeeper. Apache Hadoop

Hadoop uses ZooKeeper for automatic fail-over of Hadoop HDFS Namenode and the high availability of YARN ResourceManager.

Learn more about Apache Hadoop
Apache Hbase

HBase uses ZooKeeper for main controller election, lease management of region servers and other communication between region servers.

Learn more about Apache Hbase
Cloudera Search

Cloudera Search uses ZooKeeper for centralized configuration management to integrate search functionality with Hadoop through Apache Solr.

Learn more about the IBM and Cloudera partnership
The Data Warehouse Evolved: A Foundation for Analytical Excellence

Explore a best-in-class approach to data management and how companies are prioritizing data technologies to drive growth and efficiency.

Read the Aberdeen report
Next steps

Schedule a no-cost, one-on-one call with an IBM big data expert to learn about how we can help you extend data science and machine learning across the Apache Hadoop ecosystem.

Get connected Hadoop Community Explore Hadoop Cognitive Class Blogs