What is NameNode Automatic Failover in Hadoop HDFS?
In this Hadoop tutorial, we are going to discuss the concept of NameNode Automatic Failover in Hadoop First of all, we will see what is failover and types of failover. Then we will cover HDFS automatic failover in Hadoop. Components of Hadoop Automatic Failover in HDFS such as ZooKeeper quorum, ZKFailoverController Process (ZKFC). At last, we will also discuss the roles of these two components in Hadoop.
2. What is Failover?
The process in which system transfers its control to the secondary system when it detects a failure is known as failover.
Failover is of two types:
- Graceful Failover – Administrator initiates Graceful Failover manually, for example, in the case of routine maintenance. The system will not activate a failover automatically from the active to standby namenode, even if the active node has failed in Graceful Failover.
- Automatic Failover – Automatic Failover is the process in which system automatically transfers its control to the standby NameNode when the NameNode fails. In Hadoop Automatic failover occurs in case of NameNode failures. But in the case of NameNode failure, Failover will start automatically. This failover is Automatic failover.
3. NameNode Automatic Failover in Hadoop
Automatic failover in Hadoop adds up below components to a Hadoop HDFS deployment:
- ZooKeeper quorum.
- ZKFailoverController Process (ZKFC).
3.1. ZooKeeper Quorum
Zookeeper in Hadoop is a centralized service. It maintains configuration information, naming, and provide distributed synchronization. It also provides group services. Zookeeper also manages and coordinates a large cluster of machines.
For the below reasons, implementation of automatic HDFS failover relies on Zookeeper:
- Failure Detection- Zookeeper in Hadoop maintains a session with the namenodes. During a failure, the session will expire. So, the Zookeeper will inform other namenodes to start the failover process.
- Active NameNode Election- A simple mechanism provided by Zookeeper to only elect a node as active. At the time of active namenode failure, other namenode can take an exclusive lock in Zookeeper stating that it wants to become the next active namenode.
3.2. ZKFailoverController (ZKFC)
ZKFC is a client of Zookeeper in Hadoop that monitors and manages the namenode status. Each of the machines which run namenode service also runs a ZKFS. It can handle:
- Health monitoring – With a health-check command ZKFS pings its local namenode. The ZKFS considers the namenode healthy, as long as the namenode responds at a time. The health monitor will mark a node as unhealthy, if the node has crashed, or entered an unhealthy state.
- Zookeeper session management – ZKFC holds a session open in Zookeeper when the local namenode is healthy. It also holds a special “lock” znode, if the local namenode is active. If the session expires, the lock will be automatically deleted.
- Zookeeper-based election – In HDFS, if the local namenode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to get the lock. And, if it succeeds, then it has won the election”, and it handles running a failover to make its local namenode active.
Hence, Automatic Failover in Hadoop starts automatically in case of NameNode failure. Automatic failover adds ZooKeeper quorum and ZKFailoverController Process (ZKFC) components to an HDFS deployment. Zookeeper in Hadoop is a service that provides the automatic failover capability in HDFS.