{"id":2003,"date":"2017-10-06T12:21:32","date_gmt":"2017-10-06T12:21:32","guid":{"rendered":"http:\/\/techvidvan.com\/tutorials\/?p=431"},"modified":"2017-10-06T12:21:32","modified_gmt":"2017-10-06T12:21:32","slug":"data-locality-in-hadoop-mapreduce","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/","title":{"rendered":"Introduction to Data Locality in Hadoop MapReduce"},"content":{"rendered":"<p>In this <strong>Hadoop tutorial,<\/strong> we are going to explain you the concept of Data locality in Hadoop.<\/p>\n<p>First of all we will see the introduction to MapReduce Data locality in Hadoop, then we will discuss the need of Hadoop Data Locality next with the categories of Data Locality in MapReduce, Data locality optimization.<\/p>\n<p>At last, we will see the advantages of Hadoop Data Locality principle in this MapReduce\u00a0tutorial.<\/p>\n<h3>What is Data Locality in Hadoop MapReduce?<\/h3>\n<p>Data locality in Hadoop is the process of moving the computation close to where the actual data resides instead of moving large data to computation. This minimizes overall network congestion. This also increases the overall throughput of the system.<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/data-locality-in-hadoop.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73100\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/data-locality-in-hadoop.gif\" alt=\"data locality in hadoop\" width=\"802\" height=\"420\" \/><\/a><\/p>\n<p>The main <strong>drawback of\u00a0Hadoop<\/strong>\u00a0was cross-switch network traffic due to the huge amount of data. To overcome this drawback, Data Locality came into existence.<\/p>\n<p>In Hadoop, <strong>HDFS<\/strong>\u00a0stores datasets. Framework divides datasets into blocks and store across the datanodes.\u00a0When a client runs the MapReduce job, then NameNode sent the MapReduce code to the datanodes on which data is available according\u00a0to MapReduce job.<\/p>\n<h3>Requirement for Hadoop Data Locality<\/h3>\n<p>Hadoop architecture needs to satisfy below conditions to get the benefits of all the advantages of data locality:<\/p>\n<ul>\n<li>First, Hadoop cluster should have the appropriate topology. The Hadoop code should have the ability to read data locality.<\/li>\n<li>Second, Apache Hadoop should be aware of the topology of the nodes where tasks are executed. Also Hadoop should know where the data is located.<\/li>\n<\/ul>\n<h3>Categories of Data locality in Hadoop<\/h3>\n<p>The various categories in Hadoop Data Locality are as follows:<\/p>\n<h4>1. Data local data locality in Hadoop<\/h4>\n<p>In this, data is located on the same node as the <strong>mapper<\/strong> working on the data. In this, the proximity of data is very near to computation. Data local data locality is the most preferred scenario.<\/p>\n<h4>2. Intra-Rack data locality in Hadoop<\/h4>\n<p>As we know that it\u2019s not always possible to execute the mapper on the same datanode due to resource constraints. In this case, it is preferred to run the\u00a0mapper\u00a0on the different node but on the same rack.<\/p>\n<h4><strong>3. Inter\u2013Rack data locality in Hadoop<\/strong><\/h4>\n<p>Sometimes it is also not possible to execute mapper on a different node in the same rack. In such situation, we will execute the mapper on the nodes on different racks. Inter \u2013rack data locality is the least preferred scenario.<\/p>\n<h3>Hadoop Data locality Optimization<\/h3>\n<p>Since Data locality is the main <strong>advantage of Hadoop<\/strong> MapReduce. \u00a0But this is not always beneficial in practice due to various reasons like\u00a0Heterogeneous cluster, speculative execution, Data distribution and placement, and Data Layout.<\/p>\n<p>In large clusters challenges become more prevalent. As in large cluster more the number of data nodes and data, the less is the locality.<\/p>\n<p>In larger clusters, some nodes are newer and faster than the other, creating the data to compute ratio out of balance. Thus, large clusters tend not be completely homogenous.<\/p>\n<p>In Hadoop speculative execution since the data might not be local, but it uses the compute power. The main cause also lies in the data layout\/placement. Also non-local data processing puts a strain on the network, which creates problem to scalability. Therefore the network becomes the bottleneck.<\/p>\n<p>We can also improve data locality by first detecting which jobs have degrade over time or data locality problem. Problem-solving is more complex and involves changing the data placement and data layout using a different scheduler.<\/p>\n<p>After that we have to verify whether a new execution of the same workload has a better data locality ratio.<\/p>\n<h3>Advantages of data locality in Hadoop<\/h3>\n<ul>\n<li><strong> High Throughput &#8211;\u00a0<\/strong>Data locality in Hadoop increases the overall throughput of the system.<\/li>\n<li><strong> Faster Execution &#8211;\u00a0<\/strong>In data locality, framework move code to the node where data resides instead of moving large data to the node. Thus, this makes Hadoop faster. Because the size of the program is always lesser than the size of data, so moving data is a bottleneck of network transfer.<\/li>\n<\/ul>\n<h3>Conclusion<\/h3>\n<p>In conclusion, Data locality in Hadoop\u00a0 improves the overall execution of the system and makes Hadoop faster. Hence, it reduces network congestion.<\/p>\n<p>If you find this blog helpful, or you have any query, so leave a comment in the comment section below. We will be glad to solve them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this Hadoop tutorial, we are going to explain you the concept of Data locality in Hadoop. First of all we will see the introduction to MapReduce Data locality in Hadoop, then we will&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73101,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[570],"tags":[538,457,633,541,634,543],"class_list":["post-2003","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mapreduce","tag-apache-hadoop","tag-big-data","tag-data-locality-in-hadoop","tag-hadoop","tag-hadoop-data-locality","tag-hadoop-tutorial"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Introduction to Data Locality in Hadoop MapReduce - TechVidvan<\/title>\n<meta name=\"description\" content=\"MapReduce Data Locality principle covers need of data locality in Hadoop, Data locality Optimization,Data locality categories, Hadoop data Locality benefits\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Introduction to Data Locality in Hadoop MapReduce - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"MapReduce Data Locality principle covers need of data locality in Hadoop, Data locality Optimization,Data locality categories, Hadoop data Locality benefits\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-06T12:21:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/data-locality-in-hadoop.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"802\" \/>\n\t<meta property=\"og:image:height\" content=\"420\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Introduction to Data Locality in Hadoop MapReduce - TechVidvan","description":"MapReduce Data Locality principle covers need of data locality in Hadoop, Data locality Optimization,Data locality categories, Hadoop data Locality benefits","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/","og_locale":"en_US","og_type":"article","og_title":"Introduction to Data Locality in Hadoop MapReduce - TechVidvan","og_description":"MapReduce Data Locality principle covers need of data locality in Hadoop, Data locality Optimization,Data locality categories, Hadoop data Locality benefits","og_url":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-10-06T12:21:32+00:00","og_image":[{"width":802,"height":420,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/data-locality-in-hadoop.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Introduction to Data Locality in Hadoop MapReduce","datePublished":"2017-10-06T12:21:32+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/"},"wordCount":719,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/data-locality-in-hadoop.jpg","keywords":["apache hadoop","big data","Data Locality in hadoop","hadoop","Hadoop data locality","hadoop tutorial"],"articleSection":["MapReduce Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/","url":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/","name":"Introduction to Data Locality in Hadoop MapReduce - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/data-locality-in-hadoop.jpg","datePublished":"2017-10-06T12:21:32+00:00","description":"MapReduce Data Locality principle covers need of data locality in Hadoop, Data locality Optimization,Data locality categories, Hadoop data Locality benefits","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/data-locality-in-hadoop.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/data-locality-in-hadoop.jpg","width":802,"height":420,"caption":"Data Locality in hadoop mapreduce"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/data-locality-in-hadoop-mapreduce\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Introduction to Data Locality in Hadoop MapReduce"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2003","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2003"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2003\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73101"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2003"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2003"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2003"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}