{"id":2030,"date":"2018-01-13T12:35:06","date_gmt":"2018-01-13T12:35:06","guid":{"rendered":"https:\/\/techvidvan.com\/tutorials\/?p=806"},"modified":"2018-01-13T12:35:06","modified_gmt":"2018-01-13T12:35:06","slug":"spark-transformation-operations","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/","title":{"rendered":"Apache Spark Transformation Operations"},"content":{"rendered":"<p>Seems like Spark RDDs, input DStream transformations in Apache spark also allow the data to be modified. Many of the spark transformations available on normal spark RDD\u2019s, that Dstreams support.<\/p>\n<p>In this blog, we will learn several spark transformation operations. Basically, we will cover some of the streaming operations, for example, spark map, <a href=\"https:\/\/techvidvan.com\/tutorials\/spark-map-and-flatmap-comparison\/\">flatmap<\/a>, filter, count, ReduceByKey, CountByValue, and UpdateStateByKey.<\/p>\n<p>Meanwhile, we will use spark transformation\u00a0examples, those will help you in further Spark jobs.<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73341\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg\" alt=\"types of spark transformation operations\" width=\"1200\" height=\"628\" \/><\/a><\/p>\n<h3>Types of Spark Transformation Operations<b><br \/>\n<\/b><\/h3>\n<p>Here, we have discussed the most common streaming transformation operations <span class=\"passivevoice\">being used, f<\/span>or example map(), flatmap(), filter(), CountByValue and UpdateStateByKey, with their examples.<\/p>\n<div id=\"attachment_73068\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Apache-Spark-Streaming-Transformation-Operations-f01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-73068\" class=\"wp-image-73068 size-full\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Apache-Spark-Streaming-Transformation-Operations-f01.jpg\" alt=\"Spark Transformation Operations\" width=\"1200\" height=\"628\" \/><\/a><p id=\"caption-attachment-73068\" class=\"wp-caption-text\">Transformation Operations in Apache Spark streaming<\/p><\/div>\n<h4>a. Map()<\/h4>\n<p>Basically, it returns a new DStream, while passing each element of the source DStream through function DStream.<\/p>\n<p><strong>For Example<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">val config = new SparkConf().setMaster(\"local[2]\") .setAppName(\"MapOpTest\")\n\nval ssc = new StreamingContext(config , Seconds(1))\n\nval word = ssc.socketTextStream(\"localhost\", 9999)\n\nval answer = word.map { words=&gt; (\"hello\" ,words ) }\u00a0\u00a0\u00a0 \/\/ map hello with each line\n\nanswer.print()\n\nssc.start()\u00a0\u00a0\u00a0 \/\/ Start the computation\n\nssc.awaitTermination()\u00a0\u00a0\u00a0 \/\/ Wait for termination\n\n}\n<\/pre>\n<h4>b. FlatMap()<\/h4>\n<p>FlatMap() operation is similar to map() operation, while the only difference is, each input item can be mapped to 0 or more output items.<\/p>\n<p><strong>For Example<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">val data = ssc.socketTextStream(\"localhost\", 9999)\n\nval word = data.flatMap(_.split(\" \"))\u00a0\u00a0\u00a0 \/\/ for each line it split the words by space\n\nval pairs = word.map(x =&gt; (x, 1))\n\nval wordCount = pairs.reduceByKey(_ + _)\n\nwordCount.print()<\/pre>\n<h4>c. Filter()<\/h4>\n<p>Basically, It returns a new DStream, while selecting only the records of the source DStream on which <i>func<\/i> returns true.<\/p>\n<p><strong>For Example<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">val line = ssc.socketTextStream(\"localhost\", 9999)\n\nval word = line.flatMap(_.split(\" \"))\n\nval output = word.filter { x =&gt; x.startsWith(\"s\") }\u00a0\u00a0\u00a0 \/\/ filter the words starts with letter\u201cs\u201d\n\noutput.print()<\/pre>\n<h4>d. ReduceByKey(func, [numTasks])<\/h4>\n<p>In spark, when called on a DStream of (K, V) pairs, reduceByKey(func) return a new DStream of (K, V) pairs.\u00a0<span class=\"complexword\">However<\/span>, it happens when key values <span class=\"passivevoice\">are aggregated by<\/span> using the given reduce function.<\/p>\n<p><strong>For Example<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">val line = ssc.socketTextStream(\"localhost\", 9999)\n\nval words = line.flatMap(_.split(\" \"))\n\nval pair = words.map(word =&gt; (word, 1))\n\nval wordCount = pair.reduceByKey(_ + _)\n\nwordCount.print()<\/pre>\n<h4>e. CountByValue()<\/h4>\n<p>In spark, when called on a DStream of elements of type K, countByValue() returns a new DStream of (K, Long) pairs. Only where the value of each key is its frequency in each spark RDD of the source DStream.<\/p>\n<p><strong>For Example<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">val lines = ssc.socketTextStream(\"localhost\", 9999)\n\nval word = lines.flatMap(_.split(\" \"))\n\nword.countByValue().print()<\/pre>\n<h4>f. UpdateStateByKey()<\/h4>\n<p>It allows maintaining arbitrary state while continuously updating it with new information. Moreover, there are two steps to use this operation:<\/p>\n<ul>\n<li><b> Define \u00a0state <\/b><\/li>\n<\/ul>\n<p>It can be an arbitrary data type.<\/p>\n<ul>\n<li><b> Define state update function <\/b><\/li>\n<\/ul>\n<p>Specified with a function that how to update state, while using previous state and new values from an input stream.<\/p>\n<p>In addition, Spark will apply the state update function for all existing keys, in every batch. Regardless of whether they have new data in a batch or not. If in any case, the update function returns none, then the key-value pair will be eliminated.<\/p>\n<p><strong>For Example<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">def updateFunc(values: Seq[Int], state: Option[Int]): Option[Int] = {\n\nval currentCount = values.sum\n\nval previousCount = state.getOrElse(0)\n\nSome(currentCount + previousCount)\n\n}\n\nval ssc = new StreamingContext(conf , Seconds(10))\n\nval lines = ssc.socketTextStream(\"localhost\", 9999)\n\nssc.checkpoint(\"\/home\/asus\/checkpoints\/\")\u00a0\u00a0\u00a0 \/\/ Here .\/checkpoints\/ are the directory where all checkpoints are stored.\n\nval words = lines.flatMap(_.split(\" \"))\n\nval pair = words.map(word =&gt; (word, 1))\n\nval globalCountStream = pair.updateStateByKey(updateFunc)\n\nglobalCountStream.print()\n\nssc.start() \u00a0\u00a0\/\/ Start the computation\n\nssc.awaitTermination()<\/pre>\n<h3>Conclusion<\/h3>\n<p>Hence, in spark transformation operations we have discussed some common transformation operations in spark. As a result, we hope these examples will help you in further Spark jobs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Seems like Spark RDDs, input DStream transformations in Apache spark also allow the data to be modified. Many of the spark transformations available on normal spark RDD\u2019s, that Dstreams support. In this blog, we&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73341,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[614],"tags":[908,909,910,911,912,913,914],"class_list":["post-2030","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-spark","tag-apache-spark-streaming-transformation-operations","tag-examples-of-spark-transformation-operations","tag-spark-transformation-operations","tag-streaming-transformation-operations-in-apache-spark","tag-transformation-operations-example","tag-transformation-operations-in-spark","tag-transformation-options-in-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Spark Transformation Operations - TechVidvan<\/title>\n<meta name=\"description\" content=\"Spark Transformation Operations- what is transformation operations in spark, types of spark transformation operations map,flatmap, filter,countbyvalue,updatestatebykey with examples of tranformation operations of spark\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Spark Transformation Operations - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Spark Transformation Operations- what is transformation operations in spark, types of spark transformation operations map,flatmap, filter,countbyvalue,updatestatebykey with examples of tranformation operations of spark\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-13T12:35:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Spark Transformation Operations - TechVidvan","description":"Spark Transformation Operations- what is transformation operations in spark, types of spark transformation operations map,flatmap, filter,countbyvalue,updatestatebykey with examples of tranformation operations of spark","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/","og_locale":"en_US","og_type":"article","og_title":"Apache Spark Transformation Operations - TechVidvan","og_description":"Spark Transformation Operations- what is transformation operations in spark, types of spark transformation operations map,flatmap, filter,countbyvalue,updatestatebykey with examples of tranformation operations of spark","og_url":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2018-01-13T12:35:06+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Apache Spark Transformation Operations","datePublished":"2018-01-13T12:35:06+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/"},"wordCount":408,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg","keywords":["apache spark streaming transformation operations","examples of spark transformation operations","spark transformation operations","Streaming Transformation Operations in Apache Spark","transformation operations example","transformation operations in spark","transformation options in spark"],"articleSection":["Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/","url":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/","name":"Apache Spark Transformation Operations - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg","datePublished":"2018-01-13T12:35:06+00:00","description":"Spark Transformation Operations- what is transformation operations in spark, types of spark transformation operations map,flatmap, filter,countbyvalue,updatestatebykey with examples of tranformation operations of spark","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Types-of-Spark-Transformation-Operations-01.jpg","width":1200,"height":628,"caption":"types of spark transformation operations"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/spark-transformation-operations\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Transformation Operations"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2030","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2030"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2030\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73341"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2030"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2030"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2030"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}