{"id":1996,"date":"2017-10-04T09:11:15","date_gmt":"2017-10-04T09:11:15","guid":{"rendered":"http:\/\/techvidvan.com\/tutorials\/?p=326"},"modified":"2017-10-04T09:11:15","modified_gmt":"2017-10-04T09:11:15","slug":"hadoop-inputsplit-vs-blocks","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/","title":{"rendered":"Difference Between InputSplit vs Blocks in Hadoop"},"content":{"rendered":"<p>In this MapReduce tutorial, we will discuss the comparison between MapReduce InputSplit vs Blocks in<a href=\"https:\/\/techvidvan.com\/tutorials\/apache-hadoop-tutorials\/\"><strong> Hadoop<\/strong><\/a>. Firstly, we will see what is HDFS data blocks next to what is Hadoop InputSplit.<\/p>\n<p>Then we will see the feature wise difference between InputSplit vs Blocks. At last, we will also discuss the example of Hadoop InputSplit and Data blocks in HDFS.<\/p>\n<h3>Introduction to InputSplit and Blocks in Hadoop<\/h3>\n<p>Let&#8217; first discuss what is HDFS Data Blocks and what is Hadoop InputSplit one by one.<\/p>\n<h4>1. What is a Block in HDFS?<\/h4>\n<p><strong>Hadoop HDFS<\/strong> split large files into small chunks known as Blocks. It contains a minimum amount of data that can be read or write. HDFS stores each file as blocks.<\/p>\n<p>The Hadoop application distributes the data block across multiple nodes. HDFS client doesn\u2019t have any control on the block like block location, the Namenode decides all such things.<\/p>\n<h4>2. What is InputSplit in Hadoop?<\/h4>\n<p>It represents the data which individual <strong>mapper<\/strong> processes. Thus the number of map tasks is equal to the number of InputSplits. Framework divides split into records, which mapper processes.<\/p>\n<p>Initially input files store the data for MapReduce job. Input a file typically resides in HDFS <strong>InputFormat<\/strong> describes how to split up and read input files. InputFormat is responsible for creating InputSplit.<\/p>\n<h3>Comparison Between InputSplit vs Blocks in Hadoop<\/h3>\n<p>Let&#8217;s now discuss the feature wise difference between InputSplit vs Blocks in Hadoop Framework.<\/p>\n<h4>1. Data Representation<\/h4>\n<ul>\n<li><strong>Block &#8211;\u00a0<\/strong>HDFS Block is the physical representation of data in Hadoop.<\/li>\n<li><strong>InputSplit &#8211;\u00a0<\/strong>MapReduce InputSplit is the logical representation of data present in the block in Hadoop. It is basically used during data processing in MapReduce program or other processing techniques. The main thing to focus is that InputSplit doesn\u2019t contain actual data; it is just a reference to the data.<\/li>\n<\/ul>\n<h4>2. Size<\/h4>\n<ul>\n<li><strong>Block &#8211;\u00a0<\/strong>By default, the HDFS block size is <strong>128MB<\/strong> which you can change as per your requirement. All HDFS blocks are the same size except the last block, which can be either the same size or smaller. Hadoop framework break files into 128 MB blocks and then stores into the Hadoop file system.<\/li>\n<li><strong>InputSplit &#8211;\u00a0<\/strong>InputSplit size by default is approximately equal to block size. It is user defined. In MapReduce program the user can control split size based on the size of data.<\/li>\n<\/ul>\n<h4>3. Example of Block and InputSplit in Hadoop<\/h4>\n<p>Suppose we need to store the file in HDFS. \u00a0Hadoop HDFS stores files as blocks. Block is the smallest unit of data that can be stored or retrieved from the disk.<\/p>\n<p>The default size of the block is 128MB. Hadoop HDFS breaks files into blocks. Then it stores these blocks on different nodes in the cluster.<\/p>\n<p>For example, we have a file of 132 MB. So HDFS will break this file into 2 blocks.<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2017\/10\/hdfs-data-block.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-343\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2017\/10\/hdfs-data-block.png\" alt=\"\" width=\"1263\" height=\"189\" \/><\/a>Now, if we want to perform a MapReduce operation on the blocks, it will not process. The reason is that 2<sup>nd<\/sup>\u00a0block is incomplete. So, InpuSplit solves this problem.<\/p>\n<p>MapReduce InputSplit will form a logical grouping of blocks as a single block. As the InputSplit include a location for the next block and the byte offset of the data needed to complete the block.<a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2017\/10\/hdfs-data-blocks.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-344\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2017\/10\/hdfs-data-blocks.png\" alt=\"\" width=\"1777\" height=\"292\" \/><\/a><\/p>\n<h3>Conclusion<\/h3>\n<p>Hence, InputSplit is only a logical chunk of data i.e. It has just the information about blocks address or location. While Block is the physical representation of data.<\/p>\n<p>Now I am sure that, you have a clearer understanding about InputSplit and HDFS Data blocks after reading this blog. If you find any other difference between InputSplit vs Blocks, so do let us know in the comment section.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this MapReduce tutorial, we will discuss the comparison between MapReduce InputSplit vs Blocks in Hadoop. Firstly, we will see what is HDFS data blocks next to what is Hadoop InputSplit. Then we will&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73180,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[570],"tags":[538,457,541,616,543,617],"class_list":["post-1996","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mapreduce","tag-apache-hadoop","tag-big-data","tag-hadoop","tag-hadoop-blocks-vs-inputsplit","tag-hadoop-tutorial","tag-inputsplit-vs-blocks-in-hadoop"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Difference Between InputSplit vs Blocks in Hadoop - TechVidvan<\/title>\n<meta name=\"description\" content=\"Feature wise comparison between Hadoop InputSplit vs Blocks cover the difference of Size,data representation in InputSplit vs block,InputSplit-Block Example\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Difference Between InputSplit vs Blocks in Hadoop - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Feature wise comparison between Hadoop InputSplit vs Blocks cover the difference of Size,data representation in InputSplit vs block,InputSplit-Block Example\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-04T09:11:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/inputsplit-vs-blocks-in-hadoop.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"802\" \/>\n\t<meta property=\"og:image:height\" content=\"420\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Difference Between InputSplit vs Blocks in Hadoop - TechVidvan","description":"Feature wise comparison between Hadoop InputSplit vs Blocks cover the difference of Size,data representation in InputSplit vs block,InputSplit-Block Example","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/","og_locale":"en_US","og_type":"article","og_title":"Difference Between InputSplit vs Blocks in Hadoop - TechVidvan","og_description":"Feature wise comparison between Hadoop InputSplit vs Blocks cover the difference of Size,data representation in InputSplit vs block,InputSplit-Block Example","og_url":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-10-04T09:11:15+00:00","og_image":[{"width":802,"height":420,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/inputsplit-vs-blocks-in-hadoop.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Difference Between InputSplit vs Blocks in Hadoop","datePublished":"2017-10-04T09:11:15+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/"},"wordCount":604,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/inputsplit-vs-blocks-in-hadoop.jpg","keywords":["apache hadoop","big data","hadoop","Hadoop Blocks vs InputSplit","hadoop tutorial","InputSplit vs Blocks in hadoop"],"articleSection":["MapReduce Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/","url":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/","name":"Difference Between InputSplit vs Blocks in Hadoop - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/inputsplit-vs-blocks-in-hadoop.jpg","datePublished":"2017-10-04T09:11:15+00:00","description":"Feature wise comparison between Hadoop InputSplit vs Blocks cover the difference of Size,data representation in InputSplit vs block,InputSplit-Block Example","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/inputsplit-vs-blocks-in-hadoop.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/inputsplit-vs-blocks-in-hadoop.jpg","width":802,"height":420,"caption":"inputsplit vs blocks in hadoop"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-inputsplit-vs-blocks\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Difference Between InputSplit vs Blocks in Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/1996","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=1996"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/1996\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73180"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=1996"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=1996"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=1996"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}