{"id":1995,"date":"2017-10-03T09:18:48","date_gmt":"2017-10-03T09:18:48","guid":{"rendered":"http:\/\/techvidvan.com\/tutorials\/?p=271"},"modified":"2017-10-03T09:18:48","modified_gmt":"2017-10-03T09:18:48","slug":"hadoop-recordreader-introduction","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/","title":{"rendered":"Hadoop RecordReader Introduction, Working &amp; Types"},"content":{"rendered":"<p>In our previous blog, we have studied <strong>Hadoop<\/strong> <strong>Counters<\/strong> in detail. Now in this tutorial, we are going to discuss the RecordReader in Hadoop.<\/p>\n<p>Here we will cover the introduction to Hadoop RecordReader, working of RecordReader. We will also discuss the types of RecordReader in MapReduce,\u00a0the size of the single Record in Hadoop\u00a0MapReduce in this MapReduce Tutorial.<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/RecordReader-in-Hadoop-01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73235\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/RecordReader-in-Hadoop-01.jpg\" alt=\"working of hadoop recordreader\" width=\"1200\" height=\"628\" \/><\/a><\/p>\n<h3>What is RecordReader in MapReduce?<\/h3>\n<p>A RecordReader converts the byte-oriented view of the input to a record-oriented view for the\u00a0<strong>Mapper<\/strong> and <strong>Reducer<\/strong>\u00a0tasks for processing.<\/p>\n<p>To understand Hadoop RecordReader, we need to understand MapReduce Dataflow. Let us learn how the data flow:<\/p>\n<p>MapReduce is a simple model of data processing. Inputs and outputs for the map and reduce functions are <strong>key-value pairs<\/strong>. Following is the general form of the map and reduce functions:<\/p>\n<ul>\n<li><strong>Map:<\/strong> (K1, V1) \u2192 list (K2, V2)<\/li>\n<li><strong>Reduce:<\/strong> (K2, list (V2)) \u2192 list (K3, V3)<\/li>\n<\/ul>\n<p>Now before processing starts, it needs to know on which data to process. So, <strong>InputFormat<\/strong> class helps to achieve this. This class selects the file from\u00a0<strong>HDFS\u00a0<\/strong>that is the input to the map function. It is also responsible for creating the input splits.<\/p>\n<p>Also, divide them into records. It divides the data into the number of splits (typically 64\/128mb) in\u00a0HDFS. This is known as InputSplit. InputSplit is the logical representation of data. In a MapReduce job, execution number of map tasks is equal to the number of InputSplits.<\/p>\n<p>By calling <strong>\u2018getSplit ()\u2019<\/strong>\u00a0the client calculates the splits for the job. Then it sent to the application master. It uses their storage locations to schedule map tasks that will process them on the cluster.<\/p>\n<p>After that map task passes the split to the <strong>createRecordReader<\/strong>(<strong>)<\/strong> method. From that, it obtains RecordReader for the split. RecordReader generates record (key-value pair). Then it passes to the map function.<\/p>\n<p>Hadoop RecordReader in MapReduce job execution uses the data within the boundaries that are being created by the inputsplit. And it then creates Key-value pairs for the mapper. The \u201cstart\u201d is the byte position in the file.<\/p>\n<p>At the Start, \u00a0Hadoop RecordReader starts generating key\/value pairs. The \u201cend\u201d is where RecorReader stops reading records. \u00a0In RecordReader, the data is loaded from its source.<\/p>\n<p>Then the data are converted into key-value pairs suitable for reading by the Mapper. It communicates with the inputsplit till the file reading is not completed.<\/p>\n<h3>How RecorReader works in Hadoop?<\/h3>\n<p>It is more than iterator over the records. The map task uses one record to generate key-value pair which it passes to the map function. We can also see this by using the mapper\u2019s run function given below:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">public void run(Context context ) throws IOException, InterruptedException{\nsetup(context);\nwhile(context.nextKeyValue())\n{\nmap(context.setCurrentKey(),context.getCurrentValue(),context)\n}\ncleanup(context);\n}<\/pre>\n<p>Although it is not mandatory for RecordReader to stays in between the boundaries created by the inputsplit to generate key-value pairs it usually stays. Also, custom implementation can even read more data outside of the inputsplit.<\/p>\n<p>Then, after running <strong>setup()<\/strong>, the <strong>nextKeyValue()<\/strong> will repeat on the context. This populates the key and value objects for the mapper. By way of context, framework retrieves key-value from record reader. Then pass to the <strong>map()<\/strong> method to do its work.<\/p>\n<p>Hence, input (key-value) to the map function processes as per the logic mentioned in the map code. When the record gets to the end of the record, the <strong>nextKeyValue()<\/strong> method returns false.<\/p>\n<h3>Types of Hadoop RecordReader<\/h3>\n<p>InputFormat defines the \u00a0RecordReader instance, in Hadoop. By default, by using TextInputFormat ReordReader converts data into key-value pairs. TextInputFormat also provides 2 types of RecordReaders which as follows:<\/p>\n<h4>1. LineRecordReader<\/h4>\n<p>It is the default RecordReader.\u00a0 TextInputFormat provides this RecordReader. It also treats each line of the input file as the new value. Then the associated key is byte offset. It always skips the first line in the split (or part of it), if it is not the first split.<\/p>\n<p>It always reads one line after the boundary of the split in the end (if data is available, so it is not the last split).<\/p>\n<h4>2. SequenceFileRecordReader<\/h4>\n<p>This Hadoop RecorReader reads data specified by the header of a sequence file.<\/p>\n<h3>The maximum size of the single Record<\/h3>\n<p>By using below parameter we set maximum value.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">conf.setInt(\"mapred.linerecordreader.maxlength\", Integer.MAX_VALUE);<\/pre>\n<h3>Conclusion<\/h3>\n<p>In conclusion, Hadoop RecorReader creates the input (key-value) to Mapper. It also uses TextInputFormat for converting data into key-value pair.<\/p>\n<p>I hope you have liked this blog if you have any question related to Hadoop RecordReader, feel free to share with us. We will be glad to solve them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our previous blog, we have studied Hadoop Counters in detail. Now in this tutorial, we are going to discuss the RecordReader in Hadoop. Here we will cover the introduction to Hadoop RecordReader, working&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73235,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[570],"tags":[538,457,541,575,601,602,603],"class_list":["post-1995","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mapreduce","tag-apache-hadoop","tag-big-data","tag-hadoop","tag-hadoop-mapreduce","tag-hadoop-recordreder","tag-recordreder-in-hadoop","tag-recordreder-in-mapreduce"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Hadoop RecordReader Introduction, Working &amp; Types - TechVidvan<\/title>\n<meta name=\"description\" content=\"Introduction to MapReduce Recordreder in Hadoop,Working of Hadoop RecordReader in MapReduce,Types of RecordReader LineRecordReader, SequenceFileRecordReader\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hadoop RecordReader Introduction, Working &amp; Types - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Introduction to MapReduce Recordreder in Hadoop,Working of Hadoop RecordReader in MapReduce,Types of RecordReader LineRecordReader, SequenceFileRecordReader\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-03T09:18:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/RecordReader-in-Hadoop-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hadoop RecordReader Introduction, Working &amp; Types - TechVidvan","description":"Introduction to MapReduce Recordreder in Hadoop,Working of Hadoop RecordReader in MapReduce,Types of RecordReader LineRecordReader, SequenceFileRecordReader","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/","og_locale":"en_US","og_type":"article","og_title":"Hadoop RecordReader Introduction, Working &amp; Types - TechVidvan","og_description":"Introduction to MapReduce Recordreder in Hadoop,Working of Hadoop RecordReader in MapReduce,Types of RecordReader LineRecordReader, SequenceFileRecordReader","og_url":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2017-10-03T09:18:48+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/RecordReader-in-Hadoop-01.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Hadoop RecordReader Introduction, Working &amp; Types","datePublished":"2017-10-03T09:18:48+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/"},"wordCount":735,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/RecordReader-in-Hadoop-01.jpg","keywords":["apache hadoop","big data","hadoop","hadoop mapreduce","Hadoop RecordReder","RecordReder in Hadoop","RecordReder in mapReduce"],"articleSection":["MapReduce Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/","url":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/","name":"Hadoop RecordReader Introduction, Working &amp; Types - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/RecordReader-in-Hadoop-01.jpg","datePublished":"2017-10-03T09:18:48+00:00","description":"Introduction to MapReduce Recordreder in Hadoop,Working of Hadoop RecordReader in MapReduce,Types of RecordReader LineRecordReader, SequenceFileRecordReader","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/RecordReader-in-Hadoop-01.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/RecordReader-in-Hadoop-01.jpg","width":1200,"height":628,"caption":"working of hadoop recordreader"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/hadoop-recordreader-introduction\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Hadoop RecordReader Introduction, Working &amp; Types"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/1995","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=1995"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/1995\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73235"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=1995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=1995"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=1995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}