{"id":2004,"date":"2017-10-07T07:17:18","date_gmt":"2017-10-07T07:17:18","guid":{"rendered":"http:\/\/techvidvan.com\/tutorials\/?p=479"},"modified":"2017-10-07T07:17:18","modified_gmt":"2017-10-07T07:17:18","slug":"mapreduce-job-execution-flow","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/","title":{"rendered":"Hadoop MapReduce Job Execution flow Chart"},"content":{"rendered":"<p>In this <strong>Hadoop<\/strong> blog, we are going to provide you an end to end MapReduce job execution flow. Here we will describe each component which is the part of MapReduce working in detail.<\/p>\n<p>This blog will help you to answer how Hadoop MapReduce work, how data flows in MapReduce, how Mapreduce job is executed in Hadoop?<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/mapreduce-job-execution-flow.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73204\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/mapreduce-job-execution-flow.jpg\" alt=\"mapreduce job execution flow\" width=\"802\" height=\"420\" \/><\/a><\/p>\n<h3>What is MapReduce?<\/h3>\n<p><strong>Hadoop MapReduce<\/strong> is the data processing layer. It processes the huge amount of structured and unstructured data stored in HDFS. MapReduce processes data in parallel by dividing the job into the set of independent tasks. So, parallel processing improves speed and reliability.<\/p>\n<p>Hadoop MapReduce data processing takes place in 2 phases- Map and Reduce phase.<\/p>\n<ul>\n<li><strong>Map phase-<\/strong> It is the first phase of data processing. In this phase, we specify all the complex logic\/business rules\/costly code.<\/li>\n<li><strong>Reduce phase-<\/strong> It is the second phase of processing. In this phase, we specify light-weight processing like aggregation\/summation.<\/li>\n<\/ul>\n<h3>Steps of MapReduce Job Execution flow<\/h3>\n<p>MapReduce processess the data in various phases with the help of different components. Let&#8217;s discuss the steps of job execution in Hadoop.<\/p>\n<h4>1. Input Files<\/h4>\n<p>In input files data for MapReduce job is stored. In<strong> HDFS<\/strong>, input files reside. Input files format is arbitrary. Line-based log files and binary format can also be used.<\/p>\n<h4>2. InputFormat<\/h4>\n<p>After that InputFormat defines how to split and read these input files. It selects the files or other objects for input. InputFormat creates InputSplit.<\/p>\n<h4>3. InputSplits<\/h4>\n<p>It represents the data which will be processed by an individual <strong>Mapper<\/strong>. For each split, one map task is created. Thus the number of map tasks is equal to the number of InputSplits. Framework divide split into records, which mapper process.<\/p>\n<h4>4. RecordReader<\/h4>\n<p>It communicates with the inputSplit. And then converts the data into <strong>key-value pairs<\/strong> suitable for reading by the Mapper. RecordReader by default uses TextInputFormat to convert data into a key-value pair.<\/p>\n<p>It communicates to the InputSplit until the completion of file reading. It assigns byte offset to each line present in the file. Then, these key-value pairs are further sent to the mapper for further processing.<\/p>\n<h4>5. Mapper<\/h4>\n<p>It processes input record produced by the RecordReader and generates intermediate key-value pairs. The intermediate output is completely different from the input pair. The output of the mapper is the full collection of key-value pairs.<\/p>\n<p>Hadoop framework doesn\u2019t store the output of mapper on HDFS. It doesn\u2019t store, as data is temporary and writing on HDFS will create unnecessary multiple copies. Then Mapper passes the output to the combiner for further processing.<\/p>\n<h4>4. Combiner<\/h4>\n<p>Combiner is Mini-reducer which performs local aggregation on the mapper\u2019s output. It minimizes the data transfer between mapper and reducer. So, when the combiner functionality completes, framework passes the output to the partitioner for further processing.<\/p>\n<h4>5. Partitioner<\/h4>\n<p>Partitioner comes into the existence if we are working with more than one reducer. It takes the output of the combiner and performs partitioning.<\/p>\n<p>Partitioning of output takes place on the basis of the key in MapReduce. By hash function, key (or a subset of the key) derives the partition.<\/p>\n<p>On the basis of key value in MapReduce, partitioning of each combiner output takes place. And then the record having the same key value goes into the same partition. After that, each partition is sent to a reducer.<\/p>\n<p>Partitioning in MapReduce execution allows even distribution of the map output over the reducer.<\/p>\n<h4>6. Shuffling and Sorting<\/h4>\n<p>After partitioning, the output is shuffled to the reduce node. The shuffling is the physical movement of the data which is done over the network. As all the mappers finish and shuffle the output on the reducer nodes.<\/p>\n<p>Then framework merges this intermediate output and sort. This is then provided as input to reduce phase.<\/p>\n<h4>7. Reducer<\/h4>\n<p>Reducer then takes set of intermediate key-value pairs produced by the mappers as the input.\u00a0 After that runs a reducer function on each of them to generate the output.<\/p>\n<p>The output of the reducer is the final output. Then framework stores the output on HDFS.<\/p>\n<h4>8. RecordWriter<\/h4>\n<p>It writes these output key-value pair from the Reducer phase to the output files.<\/p>\n<h4>9. OutputFormat<\/h4>\n<p>OutputFormat defines the way how RecordReader writes these output key-value pairs in output files. So, its instances provided by the Hadoop write files in HDFS. Thus OutputFormat instances write the final output of reducer on HDFS.<\/p>\n<h3>Conclusion<\/h3>\n<p>We have learned step by step MapReduce job execution flow. I hope this blog helps you a lot to understand the MapReduce working.<\/p>\n<p>If still, you have any query related to MapReduce job execution flow, so you can share with us in the comment section given below. We will try our best to solve them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this Hadoop blog, we are going to provide you an end to end MapReduce job execution flow. Here we will describe each component which is the part of MapReduce working in detail. This&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73204,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[570],"tags":[457,539,541,463,635,636],"class_list":["post-2004","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mapreduce","tag-big-data","tag-big-data-hadoop","tag-hadoop","tag-mapreduce","tag-mapreduce-job-execution-flow","tag-mapreduce-working"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Hadoop MapReduce Job Execution flow Chart - TechVidvan<\/title>\n<meta name=\"description\" content=\"Steps for Hadoop MapReduce Job Execution flow explains Working of mapreduce in Hadoop,How Mapreduce Processes the Data using Mapper,Reducer,InputFormat etc.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hadoop MapReduce Job Execution flow Chart - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Steps for Hadoop MapReduce Job Execution flow explains Working of mapreduce in Hadoop,How Mapreduce Processes the Data using Mapper,Reducer,InputFormat etc.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-07T07:17:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/mapreduce-job-execution-flow.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"802\" \/>\n\t<meta property=\"og:image:height\" content=\"420\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hadoop MapReduce Job Execution flow Chart - TechVidvan","description":"Steps for Hadoop MapReduce Job Execution flow explains Working of mapreduce in Hadoop,How Mapreduce Processes the Data using Mapper,Reducer,InputFormat etc.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/","og_locale":"en_US","og_type":"article","og_title":"Hadoop MapReduce Job Execution flow Chart - TechVidvan","og_description":"Steps for Hadoop MapReduce Job Execution flow explains Working of mapreduce in Hadoop,How Mapreduce Processes the Data using Mapper,Reducer,InputFormat etc.","og_url":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-10-07T07:17:18+00:00","og_image":[{"width":802,"height":420,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/mapreduce-job-execution-flow.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Hadoop MapReduce Job Execution flow Chart","datePublished":"2017-10-07T07:17:18+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/"},"wordCount":783,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/mapreduce-job-execution-flow.jpg","keywords":["big data","big data hadoop","hadoop","MapReduce","MapReduce Job Execution flow","MapReduce Working"],"articleSection":["MapReduce Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/","url":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/","name":"Hadoop MapReduce Job Execution flow Chart - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/mapreduce-job-execution-flow.jpg","datePublished":"2017-10-07T07:17:18+00:00","description":"Steps for Hadoop MapReduce Job Execution flow explains Working of mapreduce in Hadoop,How Mapreduce Processes the Data using Mapper,Reducer,InputFormat etc.","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/mapreduce-job-execution-flow.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/mapreduce-job-execution-flow.jpg","width":802,"height":420,"caption":"mapreduce job execution flow"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/mapreduce-job-execution-flow\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Hadoop MapReduce Job Execution flow Chart"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2004","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2004"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2004\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73204"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2004"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2004"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2004"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}