{"id":2000,"date":"2017-10-05T07:19:59","date_gmt":"2017-10-05T07:19:59","guid":{"rendered":"http:\/\/techvidvan.com\/tutorials\/?p=372"},"modified":"2017-10-05T07:19:59","modified_gmt":"2017-10-05T07:19:59","slug":"hadoop-hdfs-erasure-coding","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/","title":{"rendered":"HDFS Erasure Coding in Big Data Hadoop"},"content":{"rendered":"<p>This blog is all about HDFS Erasure Coding. In this blog we will discuss the concept of Erasure Coding in <strong>Hadoop<\/strong>, issues of old replication scheme. Two algorithms for Hadoop erasure coding such as XOR\u00a0 Algorithm, Reed-Solomon Algorithm are also discussed in this blog.<\/p>\n<p>At last we will see the architecture and the advantages of erasure coding in Hadoop HDFS.<\/p>\n<h3>Problem with Old Scheme Replication<\/h3>\n<p><strong>HDFS Erasure coding<\/strong> is a new feature introduced to reduce storage overhead by approximately 50% compared to 3x replication. Hadoop HDFS replicates each block 3 times for various purposes. It is very simple form of redundancy to shield against the datanode failure.<\/p>\n<p>Along with pros it has various cons that it is very expensive. 3 x replication has 200% overhead in storage space and other resources. Datasets with low I\/O activity, addition replicas are rarely accessed during normal operation but still consume other resources.<\/p>\n<p>This is the reason that Hadoop Erasure coding came into existence. It provides the same level of fault tolerance with less space store and 50% storage overhead.<\/p>\n<p>When comparing the different storage scheme, an important consideration is:<\/p>\n<ul>\n<li>Data durability (number of simultaneously fault tolerance)<\/li>\n<li>Storage efficiency<\/li>\n<\/ul>\n<p>So In N-way replication, there is N-1 fault tolerance with 1\/n storage efficiency.<\/p>\n<h3>What is HDFS Erasure Coding in Hadoop?<\/h3>\n<p>HDFS Erasure Coding uses <strong>RAID<\/strong>. RAID implements EC uses stripping. Stripping logically stores the data in the form of a block. Then stores these <strong>blocks<\/strong> on the different disk. It calculates parity for each block and store. This is encoded. Through parity it recovers error.<\/p>\n<p>For fault tolerance EC extends message with redundant data. HDFS Erasure coding will operate on uniformly sized data cells. The <strong>codec<\/strong> takes a number of data cells as input. And then produces parity cells as the output.<\/p>\n<p>This whole process is called as Encoding. Parity and data cell together are called as an erasure coding group. The process by which lost data cell reconstructs over the remaining cells is known as Decoding.<\/p>\n<p>Two algorithms available for HDFS Erasure Coding are as follows:<\/p>\n<h4>a) XOR Algorithm<\/h4>\n<p>It is the simple implementation of Hadoop Erasure coding.<\/p>\n<p>Let\u2019s assume data cells X and Y and Z are data cell, then parity cell is XOR of these three data cells <strong>x \u2295 y \u2295 z<\/strong> so during the XOR operation only one parity bit is generated and if any one bit is lost it can be recovered by the remaining data cells and a parity bit.<\/p>\n<p>It is very limited since it produces 1 parity bit so the XOR operation can tolerate only 1 failure with n group size.<\/p>\n<p>\u201c<em>In XOR operation fault tolerance 1 and storage efficiency is n-1\/n when group size is <\/em>n<em>.<\/em>\u201d<\/p>\n<h4>b) Reed-Solomon Algorithm<\/h4>\n<p>Reed-Solomon addresses the XOR operation limitation. It uses linear algebra to generate multiple parity cells. RS uses two parameter k and m, k is a number of data cells and m is a number of parity cells.<\/p>\n<p>RS works by multiplying k data cells with a Generator Matrix (G<sup>T<\/sup>),\u00a0to generate extended codeword with k data cells and m parity cells. Storage failure can be recovered by the multiplying inverse of the generator matrix with the extended codewords as long as k out of k+m cells is available.<\/p>\n<p>\u201cWith<em> Reed, Solomon fault tolerance <\/em>is up to m cells and storage efficiency<em> k\/k+m where k are data cells and m are parity cells.\u201d<\/em><\/p>\n<h3>Design Decision and Architecture<\/h3>\n<p>EC striping has several advantages:<\/p>\n<ul>\n<li>Stripping enables online EC (writing data immediately in EC format), avoiding a conversion phase and immediately saving storage space.<\/li>\n<li>It distributes a small file to multiple Datanodes. It eliminates bundles multiple files into single coding group. Thus, it simplifies file operation such as deletion and migration between federated namespaces.<\/li>\n<li>To better support of small files, EC support stripping. In the future, HDFS will also support a contiguous EC layout.<\/li>\n<\/ul>\n<p>EC added many new components are:<\/p>\n<ul>\n<li><strong>NameNode Extensions<\/strong> <strong>(ECManager<\/strong>) &#8211; Stripe HDFS files are logically composed of block groups. Each of which contains a certain number of internal blocks. To reduce the memory consumption of Namenode from these additional blocks, it introduced a new hierarchical block naming protocol. EC infers the ID of a block group from the ID of any of its internal blocks. This allows management at the level of the block group rather than the block.<\/li>\n<li><strong>Client Extensions (EC Client) &#8211; <\/strong>The client can perform read and write operation on multiple internal blocks in a block group in parallel.<\/li>\n<li><strong>DataNode Extensions (ECWorker)-<\/strong> DataNode runs an additional EC worker task for recovery of failed erasure coded blocks. So, NameNode detects the failed EC blocks, namenode give recovery instruction to datanodes. Then it passes the recovery task as heartbeat response.<\/li>\n<\/ul>\n<h3>Benefits of Erasure Coding<\/h3>\n<ul>\n<li><strong>Data availability at lower capacity: <\/strong>HDFS<strong>\u00a0<\/strong>Erasure codes enable data availability at lower capacity. Initially, replicate blocks in three replicas. So, storage space of three replicas is large. But now in erasure coding store large data as a parity bit, so storage it reduces space.<\/li>\n<li><strong>Performance: <\/strong>As EC stores data as parity instead of 3 replicas so it gives better performance.<\/li>\n<li><strong>Fast recovery: <\/strong>It discovers and recovers HDFS block errors both actively (in the background) and passively (on the read path).<\/li>\n<\/ul>\n<h3>Conclusion<\/h3>\n<p>In conclusion, we can say that, HDFS Erasure coding has reduced the storage overhead by 50%. EC reduces overhead because of parity bits. Hence, these <strong>HDFS features<\/strong> empower Apache Hadoop functionality.<\/p>\n<p>If you have any query or suggestion related to Erasure Coding in HDFS, so please comment us in the section given below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog is all about HDFS Erasure Coding. In this blog we will discuss the concept of Erasure Coding in Hadoop, issues of old replication scheme. Two algorithms for Hadoop erasure coding such as&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73160,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[564],"tags":[538,457,541,624,625,557,626],"class_list":["post-2000","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hdfs","tag-apache-hadoop","tag-big-data","tag-hadoop","tag-hadoop-erasure-coding","tag-hadoop-tutoria","tag-hdfs","tag-hdfs-erasure-coding-in-hadoop"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>HDFS Erasure Coding in Big Data Hadoop - TechVidvan<\/title>\n<meta name=\"description\" content=\"HDFS Erasure Coding tutorial cover the need of Hadoop Erasure coding,erasure coding Algorithms in HDFS,XOR,Reed Solomon Algorithm, Erasure Coding Advantages\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"HDFS Erasure Coding in Big Data Hadoop - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"HDFS Erasure Coding tutorial cover the need of Hadoop Erasure coding,erasure coding Algorithms in HDFS,XOR,Reed Solomon Algorithm, Erasure Coding Advantages\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-05T07:19:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-Erasure-Coding-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"HDFS Erasure Coding in Big Data Hadoop - TechVidvan","description":"HDFS Erasure Coding tutorial cover the need of Hadoop Erasure coding,erasure coding Algorithms in HDFS,XOR,Reed Solomon Algorithm, Erasure Coding Advantages","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/","og_locale":"en_US","og_type":"article","og_title":"HDFS Erasure Coding in Big Data Hadoop - TechVidvan","og_description":"HDFS Erasure Coding tutorial cover the need of Hadoop Erasure coding,erasure coding Algorithms in HDFS,XOR,Reed Solomon Algorithm, Erasure Coding Advantages","og_url":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-10-05T07:19:59+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-Erasure-Coding-01.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"HDFS Erasure Coding in Big Data Hadoop","datePublished":"2017-10-05T07:19:59+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/"},"wordCount":926,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-Erasure-Coding-01.jpg","keywords":["apache hadoop","big data","hadoop","Hadoop Erasure Coding","Hadoop Tutoria;","hdfs","HDFS Erasure coding in hadoop"],"articleSection":["HDFS Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/","url":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/","name":"HDFS Erasure Coding in Big Data Hadoop - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-Erasure-Coding-01.jpg","datePublished":"2017-10-05T07:19:59+00:00","description":"HDFS Erasure Coding tutorial cover the need of Hadoop Erasure coding,erasure coding Algorithms in HDFS,XOR,Reed Solomon Algorithm, Erasure Coding Advantages","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-Erasure-Coding-01.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/HDFS-Erasure-Coding-01.jpg","width":1200,"height":628,"caption":"HDFS Erasure Coding in Big Data Hadoop"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-hdfs-erasure-coding\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"HDFS Erasure Coding in Big Data Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2000","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2000"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2000\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73160"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}