HDFS Erasure Coding in Big Data Hadoop

Tech Vidvan

7 years ago

This blog is all about HDFS Erasure Coding. In this blog we will discuss the concept of Erasure Coding in Hadoop, issues of old replication scheme. Two algorithms for Hadoop erasure coding such as XOR Algorithm, Reed-Solomon Algorithm are also discussed in this blog.

At last we will see the architecture and the advantages of erasure coding in Hadoop HDFS.

Problem with Old Scheme Replication

HDFS Erasure coding is a new feature introduced to reduce storage overhead by approximately 50% compared to 3x replication. Hadoop HDFS replicates each block 3 times for various purposes. It is very simple form of redundancy to shield against the datanode failure.

Along with pros it has various cons that it is very expensive. 3 x replication has 200% overhead in storage space and other resources. Datasets with low I/O activity, addition replicas are rarely accessed during normal operation but still consume other resources.

This is the reason that Hadoop Erasure coding came into existence. It provides the same level of fault tolerance with less space store and 50% storage overhead.

When comparing the different storage scheme, an important consideration is:

Data durability (number of simultaneously fault tolerance)
Storage efficiency

So In N-way replication, there is N-1 fault tolerance with 1/n storage efficiency.

What is HDFS Erasure Coding in Hadoop?

HDFS Erasure Coding uses RAID. RAID implements EC uses stripping. Stripping logically stores the data in the form of a block. Then stores these blocks on the different disk. It calculates parity for each block and store. This is encoded. Through parity it recovers error.

For fault tolerance EC extends message with redundant data. HDFS Erasure coding will operate on uniformly sized data cells. The codec takes a number of data cells as input. And then produces parity cells as the output.

This whole process is called as Encoding. Parity and data cell together are called as an erasure coding group. The process by which lost data cell reconstructs over the remaining cells is known as Decoding.

Two algorithms available for HDFS Erasure Coding are as follows:

a) XOR Algorithm

It is the simple implementation of Hadoop Erasure coding.

Let’s assume data cells X and Y and Z are data cell, then parity cell is XOR of these three data cells x ⊕ y ⊕ z so during the XOR operation only one parity bit is generated and if any one bit is lost it can be recovered by the remaining data cells and a parity bit.

It is very limited since it produces 1 parity bit so the XOR operation can tolerate only 1 failure with n group size.

“In XOR operation fault tolerance 1 and storage efficiency is n-1/n when group size is n.”

b) Reed-Solomon Algorithm

Reed-Solomon addresses the XOR operation limitation. It uses linear algebra to generate multiple parity cells. RS uses two parameter k and m, k is a number of data cells and m is a number of parity cells.

RS works by multiplying k data cells with a Generator Matrix (G^T), to generate extended codeword with k data cells and m parity cells. Storage failure can be recovered by the multiplying inverse of the generator matrix with the extended codewords as long as k out of k+m cells is available.

“With Reed, Solomon fault tolerance is up to m cells and storage efficiency k/k+m where k are data cells and m are parity cells.”

Design Decision and Architecture

EC striping has several advantages:

Stripping enables online EC (writing data immediately in EC format), avoiding a conversion phase and immediately saving storage space.
It distributes a small file to multiple Datanodes. It eliminates bundles multiple files into single coding group. Thus, it simplifies file operation such as deletion and migration between federated namespaces.
To better support of small files, EC support stripping. In the future, HDFS will also support a contiguous EC layout.

EC added many new components are:

NameNode Extensions (ECManager) – Stripe HDFS files are logically composed of block groups. Each of which contains a certain number of internal blocks. To reduce the memory consumption of Namenode from these additional blocks, it introduced a new hierarchical block naming protocol. EC infers the ID of a block group from the ID of any of its internal blocks. This allows management at the level of the block group rather than the block.
Client Extensions (EC Client) – The client can perform read and write operation on multiple internal blocks in a block group in parallel.
DataNode Extensions (ECWorker)- DataNode runs an additional EC worker task for recovery of failed erasure coded blocks. So, NameNode detects the failed EC blocks, namenode give recovery instruction to datanodes. Then it passes the recovery task as heartbeat response.

Benefits of Erasure Coding

Data availability at lower capacity: HDFS Erasure codes enable data availability at lower capacity. Initially, replicate blocks in three replicas. So, storage space of three replicas is large. But now in erasure coding store large data as a parity bit, so storage it reduces space.
Performance: As EC stores data as parity instead of 3 replicas so it gives better performance.
Fast recovery: It discovers and recovers HDFS block errors both actively (in the background) and passively (on the read path).

Conclusion

In conclusion, we can say that, HDFS Erasure coding has reduced the storage overhead by 50%. EC reduces overhead because of parity bits. Hence, these HDFS features empower Apache Hadoop functionality.

If you have any query or suggestion related to Erasure Coding in HDFS, so please comment us in the section given below.