{"id":211,"date":"2017-09-29T09:18:35","date_gmt":"2017-09-29T09:18:35","guid":{"rendered":"https:\/\/techvidvan.com\/tutorials\/?p=211"},"modified":"2017-09-29T09:18:35","modified_gmt":"2017-09-29T09:18:35","slug":"hadoop-hdfs-namenode-high-availability-hadoop","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/","title":{"rendered":"HDFS NameNode High Availability in Hadoop"},"content":{"rendered":"<p>In our previous blog, we have studied<strong> Hadoop Introduction<\/strong> and <strong>Features of Hadoop<\/strong>, Now in this blog, we are going to cover the HDFS NameNode High Availability feature in detail.<\/p>\n<p>First of all, we will discuss the HDFS NemNode High Availability Architecture, next with the implementation of Hadoop High Availability Architecture using Quorum Journal Nodes and Shared Storage.<\/p>\n<h3>HDFS NameNode High Availability<\/h3>\n<p>In <strong>HDFS<\/strong>, data is\u00a0highly available\u00a0and accessible despite hardware failure. HDFS is the most reliable storage system designed for storing very large files.<\/p>\n<p>HDFS follows master\/slave topology. In which master is <strong>NameNode<\/strong> and slaves is <strong>DataNode<\/strong>. NameNode stores meta-data. Metadata include the number of blocks, their location, replicas, and other details. For the faster retrieval of data, metadata is available in the master. NameNode maintains and assigns tasks to the slave node.<\/p>\n<p>NameNode was the <strong>Single Point of Failure (SPOF)<\/strong> before Hadoop 2.0. HDFS cluster had a single NameNode. If the NameNode fails, the whole cluster goes down.<\/p>\n<p>Single point of failure limits high availability in following ways:<\/p>\n<ul>\n<li>If any unplanned event triggers, like node crashes, then cluster would be unavailable unless an operator restarted the new namenode.<\/li>\n<li>Also planned maintenance activities like hardware upgrades on the NameNode will result in downtime of the Hadoop cluster.<\/li>\n<\/ul>\n<h3>HDFS NameNode High Availability Architecture<\/h3>\n<p>Introduction of Hadoop 2.0 overcome this<strong> SPOF<\/strong> by providing support to multiple NameNode. HDFS NameNode High Availability architecture provides the option of running two redundant NameNodes in the same cluster in an active\/passive configuration with a hot standby.<\/p>\n<ul>\n<li><strong>Active NameNode\u00a0<\/strong>\u2013 It handles all HDFS client operations in the HDFS cluster.<\/li>\n<li><strong>Passive NameNode\u00a0<\/strong>\u2013 It is a standby namenode. It has similar data as active NameNode.<\/li>\n<\/ul>\n<p>So, whenever Active NameNode fails, passive NameNode will take all the responsibility of active node. Thus, HDFS cluster continues to work.<\/p>\n<p>Issues in maintaining consistency in the HDFS High Availability cluster are as follows:<\/p>\n<ul>\n<li>Active and Standby NameNode should always be in sync with each other, i.e. they should have the same metadata. This permit to reinstate the\u00a0Hadoop cluster to the same namespace state where it got crashed. And this will provide us to have fast failover.<\/li>\n<li>There should be only one NameNode active at a time. Otherwise, two NameNode will lead to corruption of the data. We call this scenario as a \u201c<strong>Split-Brain Scenario<\/strong>\u201d, where a cluster gets divided into the smaller cluster. Each one believes that it is the only active cluster. \u201cFencing\u201d avoids such Fencing is a process of ensuring that only one NameNode remains active at a particular time.<\/li>\n<\/ul>\n<h3>Implementation of Hadoop High Availability Architecture<\/h3>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-73164 size-full\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg\" alt=\"HDFS NameNode High Availbility\" width=\"1200\" height=\"628\" \/><\/a><\/p>\n<p>Two NameNodes run at the same time in HDFS NameNode High Availability Architecture. HDFS client can implement the Active and Standby NameNode configuration in following two ways:<\/p>\n<ul>\n<li>Using Quorum Journal Nodes<\/li>\n<li>Using Shared Storage<\/li>\n<\/ul>\n<h4>1. Using Quorum Journal Nodes<\/h4>\n<p><strong>Quorum Journal Nodes<\/strong>\u00a0is an HDFS implementation. QJN provides edit logs. It permits to share these edit logs between the active and standby NameNode.<\/p>\n<p>Standby Namenode communicates and synchronizes with the active NameNode for high availability. It will happen by a group of daemons called \u201cJournal nodes\u201d. The Quorum Journal Nodes runs as a group of journal nodes. At least three journal nodes should be there.<\/p>\n<p>For N journal nodes, the system can tolerate at most (N-1)\/2 failures. The system thus continues to work. So, for three journal nodes, the system can tolerate the failure of one {(3-1)\/2} of them.<\/p>\n<p>Whenever an active node performs any modification, it logs modification to all journal nodes.<\/p>\n<p>The standby node reads the edits from the journal nodes and applies to its own Namespace in a constant manner. In the case of failover, the standby will ensure that it has read all the edits from the journal nodes before promoting itself to the Active state. This ensures that the namespace state is completely synchronized before a failure occurs.<\/p>\n<p>To provide a fast failover, the standby node must have up-to-date information about the location of\u00a0data blocks\u00a0in the cluster. For this to happen, IP address of both the NameNode is available to all the datanodes and they send block location information and heartbeats to both NameNode.<\/p>\n<h5><strong>Fencing of NameNode<\/strong><\/h5>\n<p>For the correct operation of an HA cluster, only one of the NameNodes should active at a time. Otherwise, the namespace state would deviate between the two NameNodes. So, fencing is a process to ensure this property in a cluster.<\/p>\n<ul>\n<li>The journal nodes perform this fencing by allowing only one NameNode to be the writer at a time.<\/li>\n<li>The standby NameNode takes the responsibility of writing to the journal nodes and prohibit any other NameNode to remain active.<\/li>\n<li>Finally, the new active NameNode can perform its activities.<\/li>\n<\/ul>\n<h4>2. Using Shared Storage<\/h4>\n<p>Standby and active NameNode synchronize with each other by using \u201cshared storage device\u201d. For this implementation, both active NameNode and standby Namenode must have access to the particular directory on the shared storage device (.i.e. Network file system).<\/p>\n<p>When active NameNode perform any namespace modification, it logs a record of the modification to an edit log file stored in the shared directory. The standby NameNode watches this directory for edits, and when edits occur, the standby NameNode applies them to its own namespace. In the case of failure, the standby NameNode will ensure that it has read all the edits from the shared storage before promoting itself to the Active state. This ensures that the namespace state is completely synchronized before failover occurs.<\/p>\n<p>To prevent the \u201csplit-brain scenario\u201d in which the namespace state deviates between the two NameNode, an administrator must configure at least one fencing method for the shared storage.<\/p>\n<h3>Conclusion<\/h3>\n<p>Hence, Hadoop 2.0 HDFS HA provide for single active NameNode and single standby NameNode. But some deployments need a high degree of<strong>\u00a0fault tolerance<\/strong>. Hadoop new version 3.0, allows the user to run many standby NameNodes.<\/p>\n<p>For example, configuring five journalnodes and three NameNode. As a result hadoop cluster is able to tolerate the failure of two nodes rather than one.<\/p>\n<p>Please share your experience and suggestions in related to HDFS NameNode High Availability in the comment section below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our previous blog, we have studied Hadoop Introduction and Features of Hadoop, Now in this blog, we are going to cover the HDFS NameNode High Availability feature in detail. First of all, we&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73164,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[564],"tags":[],"class_list":["post-211","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hdfs"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>HDFS NameNode High Availability in Hadoop - TechVidvan<\/title>\n<meta name=\"description\" content=\"Hadoop HDFS NameNode High Availability cover What is SPOF,What is Hadoop High Availability,HDFS NameNode High Availability Architecture,Quorum Journal Nodes\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"HDFS NameNode High Availability in Hadoop - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Hadoop HDFS NameNode High Availability cover What is SPOF,What is Hadoop High Availability,HDFS NameNode High Availability Architecture,Quorum Journal Nodes\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-09-29T09:18:35+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"HDFS NameNode High Availability in Hadoop - TechVidvan","description":"Hadoop HDFS NameNode High Availability cover What is SPOF,What is Hadoop High Availability,HDFS NameNode High Availability Architecture,Quorum Journal Nodes","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/","og_locale":"en_US","og_type":"article","og_title":"HDFS NameNode High Availability in Hadoop - TechVidvan","og_description":"Hadoop HDFS NameNode High Availability cover What is SPOF,What is Hadoop High Availability,HDFS NameNode High Availability Architecture,Quorum Journal Nodes","og_url":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-09-29T09:18:35+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"HDFS NameNode High Availability in Hadoop","datePublished":"2017-09-29T09:18:35+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/"},"wordCount":1018,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg","articleSection":["HDFS Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/","url":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/","name":"HDFS NameNode High Availability in Hadoop - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg","datePublished":"2017-09-29T09:18:35+00:00","description":"Hadoop HDFS NameNode High Availability cover What is SPOF,What is Hadoop High Availability,HDFS NameNode High Availability Architecture,Quorum Journal Nodes","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-NameNode-High-Availbility-2-01.jpg","width":1200,"height":628,"caption":"HDFS NameNode High Availbility"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-namenode-high-availability-hadoop\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"HDFS NameNode High Availability in Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=211"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/211\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73164"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}