{"id":2036,"date":"2018-01-19T06:51:21","date_gmt":"2018-01-19T06:51:21","guid":{"rendered":"https:\/\/techvidvan.com\/tutorials\/?p=880"},"modified":"2018-01-19T06:51:21","modified_gmt":"2018-01-19T06:51:21","slug":"spark-streaming-window-operations","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/","title":{"rendered":"Spark Streaming Window Operations- A Quick Guide"},"content":{"rendered":"<p>Spark streaming leverages advantage of windowed computations in Apache Spark. It offers to apply transformations over a sliding window of data. In this article, we will learn the whole concept of Apache spark streaming window operations.<\/p>\n<p>Moreover, we will also learn some Spark Window operations to understand in detail.<\/p>\n<h3>What are Spark Streaming Window operations<b><br \/>\n<\/b><\/h3>\n<p>Spark streaming leverages advantage of windowed computations in spark. It offers to apply transformations over a sliding window of data. The figure mentioned below explains this sliding window.<\/p>\n<div id=\"attachment_73361\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Window-Operations-01-2.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-73361\" class=\"wp-image-73361 size-full\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Window-Operations-01-2.jpg\" alt=\"spark streaming windows operations\" width=\"1200\" height=\"628\" \/><\/a><p id=\"caption-attachment-73361\" class=\"wp-caption-text\">Introduction \u2013 Spark Streaming Window operations<\/p><\/div>\n<p>As window slides over a source DStream, the source RDDs that fall within the window are combined. It also operated upon which produces spark RDDs of the windowed DStream. Hence, In this specific case, the operation is applied over the last 3 time units of data, also slides by 2-time units.<\/p>\n<p>Basically, any Spark window operation requires specifying two parameters.<\/p>\n<ul>\n<li><strong>Window length<\/strong> &#8211; It defines the duration of the window (3 in the figure).<\/li>\n<li><strong>Sliding interval<\/strong> &#8211; It defines the interval at which the window operation is performed (2 in the figure).<\/li>\n<\/ul>\n<p>However, these 2 parameters must be multiples of the batch interval of the source DStream.<\/p>\n<p>Let\u2019s understand this operation with an example.<\/p>\n<p><strong>For example: <\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">import org.apache.spark._\nimport org.apache.spark.streaming._\nimport org.apache.spark.streaming.StreamingContext._ \/\/ not necessary since Spark 1.3\n\n\/\/ Create a local StreamingContext with two working thread and batch interval of 1 second.\n\/\/ The master requires 2 cores to prevent a starvation scenario.\n\nval conf1 = new SparkConf().setMaster(\"local[2]\").setAppName(\"NetworkWordCount\")\nval ssc1 = new StreamingContext(conf1, Seconds(1))<\/pre>\n<p>To enhance above example by generating word counts over the last 30 seconds of data, every 10 seconds. We need to apply the reduceByKey operation on the pairs DStream of (word, 1) pairs over the last 30 seconds of data.<\/p>\n<p>Basically, it is possible by using the operation reduceByKeyAndWindow.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">\/\/ Reduce last 30 seconds of data, every 10 seconds\nval windowedWordCounts = pairs.reduceByKeyAndWindow((a:Int,b:Int) =&gt; (a + b), Seconds(30), Seconds(10))<\/pre>\n<h3>Common Spark Window Operations<\/h3>\n<p>These operations describe two parameters &#8211; windowLength and slideInterval.<\/p>\n<h4>1. Window (windowLength, slideInterval)<\/h4>\n<p>Window operation returns a new DStream. On the basis of windowed batches of the source DStream, it gets computed.<\/p>\n<h4>2. CountByWindow (windowLength, slideInterval)<\/h4>\n<p>In the stream, countByWindow operation returns a sliding window count of elements.<\/p>\n<h4>3. ReduceByWindow (func, windowLength, slideInterval)<\/h4>\n<p>ReduceByWindow returns a new single-element stream, that is created by aggregating elements in the stream over a sliding interval using func. However, a function must be commutative and associative, so that it can be computed correctly in parallel.<\/p>\n<h4>4. ReduceByKeyAndWindow(func, windowLength, slideInterval, [numTasks])<\/h4>\n<p>Whenever we call reduceByKeyAndWindow window on a DStream of (K, V) pairs, it returns a new DStream of (K, V) pairs. Here, we aggregate values of each key, by given reduce function func over batches in a sliding window.<\/p>\n<p>In addition, it uses spark&#8217;s default number of parallel tasks, for grouping purpose. Like for local mode, it is 2. While in cluster mode it determines number using spark.default.parallelism config property. To set a different number of tasks, it passes an optional numTasks argument.<\/p>\n<h4>5. ReduceByKeyAndWindow (func, invFunc, windowLength, slideInterval, [numTasks])<\/h4>\n<p>It is the more efficient version of the above reduceByKeyAndWindow(). As in above one, we calculate the reduced value of each window by using the reduce values of the previous window.<\/p>\n<p>However, here calculations take place by reducing the new data. For calculating, it reduces data which enters the sliding window. Also performs \u201cinverse reducing\u201d of the old data which leaves the window.<\/p>\n<p><strong>Note:<\/strong>\u00a0Checkpointing must be enabled for using this operation.<\/p>\n<h4>6. CountByValueAndWindow(windowLength, slideInterval, [numTasks])<\/h4>\n<p>While, we call countByValueAndWindow on a DStream of (K, V) pairs, it returns a new DStream of (K, Long) pairs. Here, the value of each key is its frequency within a sliding window. In one case it is very similar to reduceByKeyAndWindow operation. Here also, we can configure the number of reduce tasks by an optional argument.<\/p>\n<h3>Conclusion<\/h3>\n<p>As a result, we have seen how spark windowing helps to apply transformations over a sliding window of data. Hence, this feature makes very easy to compute stats for a window of time.<\/p>\n<p>Therefore, it increases the efficiency of the system. Ultimately, we have learned the whole about spark streaming window operations\u00a0in detail.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Spark streaming leverages advantage of windowed computations in Apache Spark. It offers to apply transformations over a sliding window of data. In this article, we will learn the whole concept of Apache spark streaming&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73361,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[614],"tags":[953,954,955,956,957,958,959,960,961,962],"class_list":["post-2036","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-spark","tag-apache-spark-window-operations-quick-guide","tag-introducing-window-functions-in-spark","tag-spark-streaming-window","tag-spark-window","tag-spark-window-operation","tag-streaming-window","tag-window-aggregation-functions","tag-window-operation","tag-window-operation-in-spark-streaming","tag-window-operations-on-spark-streaming"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spark Streaming Window Operations- A Quick Guide - TechVidvan<\/title>\n<meta name=\"description\" content=\"Spark Streaming Window Operation- what is window operation in spark streaming, common spark window operations: window,countbywindow,reducebywindow,reducebykeyandwindow\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spark Streaming Window Operations- A Quick Guide - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Spark Streaming Window Operation- what is window operation in spark streaming, common spark window operations: window,countbywindow,reducebywindow,reducebykeyandwindow\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-19T06:51:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Window-Operations-01-2.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spark Streaming Window Operations- A Quick Guide - TechVidvan","description":"Spark Streaming Window Operation- what is window operation in spark streaming, common spark window operations: window,countbywindow,reducebywindow,reducebykeyandwindow","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/","og_locale":"en_US","og_type":"article","og_title":"Spark Streaming Window Operations- A Quick Guide - TechVidvan","og_description":"Spark Streaming Window Operation- what is window operation in spark streaming, common spark window operations: window,countbywindow,reducebywindow,reducebykeyandwindow","og_url":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2018-01-19T06:51:21+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Window-Operations-01-2.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Spark Streaming Window Operations- A Quick Guide","datePublished":"2018-01-19T06:51:21+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/"},"wordCount":636,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Window-Operations-01-2.jpg","keywords":["Apache Spark Window operations: Quick Guide","Introducing Window Functions in Spark","Spark Streaming Window","spark window","spark window operation","streaming window","Window Aggregation Functions","window operation","Window operation in spark streaming","Window Operations on Spark Streaming"],"articleSection":["Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/","url":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/","name":"Spark Streaming Window Operations- A Quick Guide - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Window-Operations-01-2.jpg","datePublished":"2018-01-19T06:51:21+00:00","description":"Spark Streaming Window Operation- what is window operation in spark streaming, common spark window operations: window,countbywindow,reducebywindow,reducebykeyandwindow","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Window-Operations-01-2.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Window-Operations-01-2.jpg","width":1200,"height":628},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-window-operations\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Spark Streaming Window Operations- A Quick Guide"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2036"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2036\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73361"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}