{"id":322,"date":"2017-10-04T06:17:59","date_gmt":"2017-10-04T06:17:59","guid":{"rendered":"http:\/\/techvidvan.com\/tutorials\/?p=322"},"modified":"2017-10-04T06:17:59","modified_gmt":"2017-10-04T06:17:59","slug":"inputsplit-in-hadoop-mapreduce","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/","title":{"rendered":"What is InputSplit in Hadoop MapReduce?"},"content":{"rendered":"<p>In our previous <strong>Hadoop tutorial<\/strong>, we have studied Hadoop\u00a0<strong>Partitioner<\/strong> in detail. Now we are going to discuss InputSplit in Hadoop MapReduce.<\/p>\n<p>Here, we will cover what is Hadoop InputSplit, the need of InputSplit\u00a0in MapReduce. We will also discuss how these InputSplits are created in Hadoop MapReduce in great detail.<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73179\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg\" alt=\"InputSplit in hadoop Mapreduce\" width=\"1200\" height=\"628\" \/><\/a><\/p>\n<h3>Introduction to InputSplit in Hadoop<\/h3>\n<p>InputSplit is the logical representation of data in Hadoop MapReduce. It represents the data which individual <strong>mapper<\/strong> processes. Thus the number of map tasks is equal to the number of InputSplits. Framework divides split into records, which mapper processes.<\/p>\n<p>MapReduce InputSplit length has measured in bytes. Every InputSplit has storage locations (hostname strings). The MapReduce system places map tasks as close to the split\u2019s data as possible by using storage locations.<\/p>\n<p>Framework processes Map tasks in the order of the size of the splits so that the largest one gets processed first (greedy approximation algorithm). This minimizes the job run time.<\/p>\n<p>The main thing to focus is that Inputsplit does not contain the input data; it is just a reference to the data.<\/p>\n<h3>How InputSplits are created in Hadoop MapReduce?<\/h3>\n<p>As a user, we don\u2019t deal with InputSplit in Hadoop directly, as<strong>\u00a0InputFormat<\/strong>\u00a0(as InputFormat is responsible for creating the Inputsplit and dividing into the records) creates it. FileInputFormat breaks a file into 128MB chunks.<\/p>\n<p>Also, by setting <strong>mapred<\/strong>.<strong>min<\/strong>.<strong>split<\/strong>.<strong>size<\/strong> parameter in <strong>mapred-site<\/strong>.<strong>xml<\/strong> user can change the value as per requirement. Also by this we can override the parameter in the Job object used to submit a particular MapReduce job.<\/p>\n<p>By writing a custom InputFormat we can also control how the file is broken into splits.<\/p>\n<p>InputSplit is user defined. The user can also control split size based on the size of data in MapReduce program. Hence, In a MapReduce job execution number of map tasks is equal to the number of InputSplits.<\/p>\n<p>By calling <strong>\u2018getSplit()\u2019<\/strong>, the client calculate the splits for the job. Then it sent to the application master, which uses their storage locations to schedule map tasks that will process them on the cluster.<\/p>\n<p>After that map task passes the split to the <strong>createRecordReader()<\/strong> method. From that it obtains <strong>RecordReader<\/strong> for the split. Then RecordReader generate record <strong>(key-value pair)<\/strong>, which it passes to the map function.<\/p>\n<h3>Conclusion<\/h3>\n<p>In conclusion we can say that, InputSplit represents the data which individual mapper processes. For each split one map task is created. Hence, InputFormat creates the InputSplit.<\/p>\n<p>If you have any query about InputSplit in MapReduce, so, please leave a comment in a section given below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our previous Hadoop tutorial, we have studied Hadoop\u00a0Partitioner in detail. Now we are going to discuss InputSplit in Hadoop MapReduce. Here, we will cover what is Hadoop InputSplit, the need of InputSplit\u00a0in MapReduce.&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73179,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[570],"tags":[538,457,539,541,543,604],"class_list":["post-322","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mapreduce","tag-apache-hadoop","tag-big-data","tag-big-data-hadoop","tag-hadoop","tag-hadoop-tutorial","tag-inputsplit"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>What is InputSplit in Hadoop MapReduce? - TechVidvan<\/title>\n<meta name=\"description\" content=\"MapReduce InputSplit Introduction covers what is InputSplit in Hadoop,how Hadoop creates InputSplits,how to change the split size in Hadoop,how Hadoop works\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is InputSplit in Hadoop MapReduce? - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"MapReduce InputSplit Introduction covers what is InputSplit in Hadoop,how Hadoop creates InputSplits,how to change the split size in Hadoop,how Hadoop works\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-04T06:17:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is InputSplit in Hadoop MapReduce? - TechVidvan","description":"MapReduce InputSplit Introduction covers what is InputSplit in Hadoop,how Hadoop creates InputSplits,how to change the split size in Hadoop,how Hadoop works","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/","og_locale":"en_US","og_type":"article","og_title":"What is InputSplit in Hadoop MapReduce? - TechVidvan","og_description":"MapReduce InputSplit Introduction covers what is InputSplit in Hadoop,how Hadoop creates InputSplits,how to change the split size in Hadoop,how Hadoop works","og_url":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-10-04T06:17:59+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"What is InputSplit in Hadoop MapReduce?","datePublished":"2017-10-04T06:17:59+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/"},"wordCount":433,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg","keywords":["apache hadoop","big data","big data hadoop","hadoop","hadoop tutorial","InputSplit"],"articleSection":["MapReduce Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/","url":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/","name":"What is InputSplit in Hadoop MapReduce? - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg","datePublished":"2017-10-04T06:17:59+00:00","description":"MapReduce InputSplit Introduction covers what is InputSplit in Hadoop,how Hadoop creates InputSplits,how to change the split size in Hadoop,how Hadoop works","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/InputSplit-in-hadoop-Mapreduce-01.jpg","width":1200,"height":628,"caption":"InputSplit in hadoop Mapreduce"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/inputsplit-in-hadoop-mapreduce\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"What is InputSplit in Hadoop MapReduce?"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/322","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=322"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/322\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73179"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=322"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=322"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=322"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}