Site icon TechVidvan

20 Notable Difference Between Hadoop 2.x vs Hadoop 3.x

hadoop 2.x vs hadoop 3.x - Feature wise difference

The objective of this Hadoop tutorial is to provide you a clearer understanding between different Hadoop version. In this blog we have covered top, 20 Difference between Hadoop 2.x vs Hadoop 3.x.

This blog covers the difference between Hadoop 2 and Hadoop 3 on the basis of different features.

Difference Between Hadoop 2.x vs Hadoop 3.x

Apache Hadoop is an open source software framework for distributed storage & processing of huge amount of data sets.

Hadoop 3.x was introduced to overcome the limitation of Hadoop 2.x. Hadoop 3.x has added some new features, although the old features are still used.

Detailed feature wise comparison between Hadoop 2.x vs Hadoop 3.x are given below:

a. License

b. Minimum supported version of Java

c. Fault Tolerance

d. Data Balancing

e. Storage Scheme

f. Storage Overhead

g. Storage Overhead Example

h. YARN Timeline Service

j. Default Ports Range

k. Tools

l. Compatible File System

m. Datanode Resources

n. MR API Compatibility

o. Support for Microsoft

p. Slots/container

q. Single point of failure

r. HDFS Federation

s. Scalability

t. HDFS Snapshot

u. Platform

Conclusion

In conclusion, Hadoop 3.0 has added new features like erasure coding to handle fault tolerance. Hadoop 3.x also reduces the storage overhead by 200% to 50%.

It also introduced a new command line tool called Disk balancer. Hence, Hadoop 3.x has improved overall performance.

If you find any other difference between Hadoop 2.x vs Hadoop 3.x, so do let us know in the comment section.

Exit mobile version