Top 6 Features of HDFS – A Hadoop HDFS Tutorial
In our previous blog we have learned Hadoop HDFS in detail, now in this blog, we are going to cover the features of HDFS. Hadoop HDFS has the features like Fault Tolerance, Replication, Reliability, High Availability, Distributed Storage, Scalability etc. All these features of HDFS in Hadoop will be discussed in this Hadoop HDFS tutorial.
If you are not familiar with Hadoop so you can refer our Hadoop Tutorial to learn Apache Hadoop in detail.
Hadoop distributed file system (HDFS) is the primary storage system of Hadoop. It stores very large files running on a cluster of commodity hardware. HDFS is based on GFS (Google FileSystem). It stores data reliably even in the case of hardware failure. HDFS also provides high-throughput access to the application by accessing in parallel. According to a prediction by the end of 2017, 75% of the data available on the planet will be residing in HDFS.
3. 6 Important Features of HDFS
After studying Hadoop HDFS introduction, let’s now discuss the most important features of HDFS.
3.1. Fault Tolerance
The fault tolerance in Hadoop HDFS is the working strength of a system in unfavorable conditions. It is highly fault-tolerant. Hadoop framework divides data into blocks. After that creates multiple copies of blocks on different machines in the cluster. So, when any machine in the cluster goes down, then a client can easily access their data from the other machine which contains the same copy of data blocks.
3.2. High Availability
Hadoop HDFS is a highly available file system. In HDFS, data gets replicated among the nodes in the Hadoop cluster by creating a replica of the blocks on the other slaves present in HDFS cluster. So, whenever a user wants to access this data, they can access their data from the slaves which contain its blocks. At the time of unfavorable situations like a failure of a node, a user can easily access their data from the other nodes. Because duplicate copies of blocks are present on the other nodes in the HDFS cluster.
3.3. High Reliability
HDFS provides reliable data storage. It can store data in the range of 100s of petabytes. HDFS stores data reliably on a cluster. It divides the data into blocks. Hadoop framework stores these blocks on nodes present in HDFS cluster. HDFS stores data reliably by creating a replica of each and every block present in the cluster. Hence provides fault tolerance facility. If the node in the cluster containing data goes down, then a user can easily access that data from the other nodes. HDFS by default creates 3 replicas of each block containing data present in the nodes. So, data is quickly available to the users. Hence user does not face the problem of data loss. Thus, HDFS is highly reliable.
Data Replication is unique features of HDFS. Replication solves the problem of data loss in an unfavorable condition like hardware failure, crashing of nodes etc. HDFS maintain the process of replication at regular interval of time. HDFS also keeps creating replicas of user data on different machine present in the cluster. So, when any node goes down, the user can access the data from other machines. Thus, there is no possibility of losing of user data.
Hadoop HDFS stores data on multiple nodes in the cluster. So, whenever requirements increase you can scale the cluster. Two scalability mechanisms are available in HDFS: Vertical and Horizontal Scalability.
3.6. Distributed Storage
All the features in HDFS are achieved via distributed storage and replication. HDFS store data in a distributed manner across the nodes. In Hadoop, data is divided into blocks and stored on the nodes present in the HDFS cluster. After that HDFS create the replica of each and every block and store on other nodes. When a single machine in the cluster gets crashed we can easily access our data from the other nodes which contain its replica.
In conclusion, we can say, HDFS is highly-fault tolerant. It reliably stores huge amount of data despite hardware failure. It also provides High scalability and high availability. Hence HDFS empowers Hadoop functionality. If you find any other feature of HDFS, so please share with us in the comment section below.