{"id":476,"date":"2017-10-07T06:28:47","date_gmt":"2017-10-07T06:28:47","guid":{"rendered":"http:\/\/techvidvan.com\/tutorials\/?p=476"},"modified":"2017-10-07T06:28:47","modified_gmt":"2017-10-07T06:28:47","slug":"mapreduce-job-optimization-techniques","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/","title":{"rendered":"6 Best MapReduce Job Optimization Techniques"},"content":{"rendered":"<p>Performance tuning will help in optimizing your<strong> Hadoop<\/strong> performance. In this blog, we are going to discuss all those techniques for MapReduce Job optimizations.<\/p>\n<p>In this MapReduce tutorial, we will provide you 6 important tips for MapReduce Job Optimization such as the Proper configuration of your cluster, LZO compression usage, Proper tuning of the number of MapReduce tasks etc.<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73206\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg\" alt=\"MapReduce Job Optimization Techniques\" width=\"1201\" height=\"629\" \/><\/a><\/p>\n<h3>Tips for MapReduce Job Optimization<\/h3>\n<p>Below are some MapReduce\u00a0job optimization techniques that would help you in optimizing\u00a0MapReduce job performance.<\/p>\n<h4>1. Proper configuration of your cluster<\/h4>\n<ul>\n<li>With <strong>-noatime<\/strong> option Dfs and MapReduce storage are mounted. This will disable the access time. Thus improves I\/O performance.<\/li>\n<li>Try to avoid the RAID on TaskTracker and datanode machines. This generally reduces performance.<\/li>\n<li>Ensure that you have configured <strong>mapred.local.dir<\/strong> and <strong>dfs.data.dir<\/strong> to point to one directory on each of your disks. This is to ensure that all of your I\/O capacity is used.<\/li>\n<li>You should monitor the graph of swap usage and network usage with software. If you see that swap is being used, you should reduce the amount of RAM allocated to each task in <strong>mapred.child.java.opts<\/strong>.<\/li>\n<li>Make sure that you should have smart monitoring to the health status of your disk drives. This is one of the important practices for\u00a0\u00a0MapReduce performance tuning.<\/li>\n<\/ul>\n<h4>2. LZO compression usage<\/h4>\n<p>For Intermediate data, this is always a good idea. Every\u00a0Hadoop\u00a0job that generates a non-negligible amount of map output will get benefit from intermediate data compression with LZO.<\/p>\n<p>Although LZO adds a little bit of overhead to CPU, it saves time by reducing the amount of disk IO during the shuffle.<\/p>\n<p>Set <strong>mapred.compress.map.output<\/strong> to true to enable LZO compression<\/p>\n<h4>3. Proper tuning of the number of MapReduce tasks<\/h4>\n<ul>\n<li>In MapReduce job, if each task takes 30-40 seconds or more, then it will reduce the number of tasks. The <strong>mapper\u00a0<\/strong>or\u00a0<strong>reducer\u00a0<\/strong>process involves following things: first, you need to start JVM (JVM loaded into the memory). Then you need to initialize JVM. And after processing (mapper\/reducer) you need to de-initialize JVM. And these JVM tasks are very costly. Suppose a case in which mapper runs a task just for 20-30 seconds. For this, we need to start\/initialize\/stop JVM. This might take a considerable amount of time. So, it is strictly recommended to run the task for at least 1 minute.<\/li>\n<li>If a job has more than 1TB of input. Then you should consider increasing the block size of the input dataset to 256M or even 512M. So the number of tasks will be smaller. You can change the block size by using the command Hadoop <strong>distcp \u2013Hdfs.block.size=$[256*1024*1024] \/path\/to\/inputdata \/path\/to\/inputdata-with-largeblocks<\/strong><\/li>\n<li>As we know that each task runs for at least 30-40 seconds. You should increase the number of mapper tasks to some multiple of the number of mapper slots in the cluster.<\/li>\n<li>Don\u2019t run too many reduce tasks \u2013 for most jobs. The number of reduce tasks equal to or a bit less than the number of reduce slots in the cluster.<\/li>\n<\/ul>\n<h4>4. Combiner between Mapper and Reducer<\/h4>\n<p>If algorithm involves computing aggregates of any sort, then we should use a<strong>\u00a0Combiner.\u00a0<\/strong>Combiner performs some aggregation before the data hits the reducer.<\/p>\n<p>The Hadoop MapReduce framework runs combine intelligently to reduce the amount of data to be written to disk. And that data has to be transferred between the Map and Reduce stages of computation.<\/p>\n<h4>5. Usage of most appropriate and compact writable type for data<\/h4>\n<p>Big data users use the Text writable type unnecessarily to switch from\u00a0Hadoop\u00a0Streaming to Java MapReduce. Text can be convenient. It\u2019s inefficient to convert numeric data to and from UTF8 strings. And can actually make up a significant portion of CPU time.<\/p>\n<h4>6. Reusage of Writables<\/h4>\n<p>Many MapReduce users make one very common mistake that is to allocate a new Writable object for every output from a mapper\/reducer. Suppose, for example, word-count mapper implementation as follows:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-theme=\"classic\">public void map(...) {\n...\nfor (String word: words) {\noutput.collect(new Text(word),\u00a0new IntWritable(1));\n}<\/pre>\n<p>This implementation causes allocation of thousands of short-lived objects. While Java garbage collector does a reasonable job at dealing with this, it is more efficient to write:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">class MyMapper ... {\nText wordText =\u00a0new Text();\nIntWritable one =\u00a0new IntWritable(1);\npublic void map(...) {\n...\u00a0for (String word: words)\n{\nwordText.set(word);\noutput.collect(word, one); }\n}\n}<\/pre>\n<h3>Conclusion<\/h3>\n<p>Hence, there are various MapReduce job optimization techniques that help you in optimizing MapReduce job. Like using combiner between mapper and Reducer, by LZO compression usage, proper tuning of the number of MapReduce tasks, Reusage of writable.<\/p>\n<p>If you find ant other technique for MapReduce job optimization, so do let us know in the comment section given below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Performance tuning will help in optimizing your Hadoop performance. In this blog, we are going to discuss all those techniques for MapReduce Job optimizations. In this MapReduce tutorial, we will provide you 6 important&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73206,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[570],"tags":[538,457,595,541,463,609],"class_list":["post-476","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mapreduce","tag-apache-hadoop","tag-big-data","tag-big-data-hadoop-mapreduce","tag-hadoop","tag-mapreduce","tag-mapreduce-job-optimization"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>6 Best MapReduce Job Optimization Techniques - TechVidvan<\/title>\n<meta name=\"description\" content=\"6 important tips for MapReduce Job Optimization helps to improve MapReduce Performance-LZO compression,number of MapReduce tasks,use of Writables &amp; Combiner\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"6 Best MapReduce Job Optimization Techniques - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"6 important tips for MapReduce Job Optimization helps to improve MapReduce Performance-LZO compression,number of MapReduce tasks,use of Writables &amp; Combiner\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-07T06:28:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1201\" \/>\n\t<meta property=\"og:image:height\" content=\"629\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"6 Best MapReduce Job Optimization Techniques - TechVidvan","description":"6 important tips for MapReduce Job Optimization helps to improve MapReduce Performance-LZO compression,number of MapReduce tasks,use of Writables & Combiner","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/","og_locale":"en_US","og_type":"article","og_title":"6 Best MapReduce Job Optimization Techniques - TechVidvan","og_description":"6 important tips for MapReduce Job Optimization helps to improve MapReduce Performance-LZO compression,number of MapReduce tasks,use of Writables & Combiner","og_url":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-10-07T06:28:47+00:00","og_image":[{"width":1201,"height":629,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"6 Best MapReduce Job Optimization Techniques","datePublished":"2017-10-07T06:28:47+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/"},"wordCount":753,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg","keywords":["apache hadoop","big data","Big Data Hadoop MapReduce","hadoop","MapReduce","MapReduce job optimization"],"articleSection":["MapReduce Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/","url":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/","name":"6 Best MapReduce Job Optimization Techniques - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg","datePublished":"2017-10-07T06:28:47+00:00","description":"6 important tips for MapReduce Job Optimization helps to improve MapReduce Performance-LZO compression,number of MapReduce tasks,use of Writables & Combiner","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/MapReduce-Job-Optimization-Techniques-01.jpg","width":1201,"height":629,"caption":"MapReduce Job Optimization Techniques"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-optimization-techniques\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"6 Best MapReduce Job Optimization Techniques"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/476","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=476"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/476\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73206"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=476"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=476"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=476"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}