{"id":2041,"date":"2018-01-17T09:03:03","date_gmt":"2018-01-17T09:03:03","guid":{"rendered":"https:\/\/techvidvan.com\/tutorials\/?p=1070"},"modified":"2018-01-17T09:03:03","modified_gmt":"2018-01-17T09:03:03","slug":"spark-streaming-execution-flow","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/","title":{"rendered":"Spark Streaming Execution Flow and Streaming Model"},"content":{"rendered":"<p>Spark Streaming enables fast, scalable and fault-tolerant processing of live data streams. In this article, we will learn the whole concept of spark streaming execution flow. For better understanding, we will start with basics of Apache Spark Streaming. Going forward, we will learn Streaming model in detail.<\/p>\n<p>Moreover, we will learn the internal working of stream execution flow in spark. To know complete information, we will also know little about spark streaming sources.<\/p>\n<p><a href=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Spark-Streaming-Execution-Flow.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73300\" src=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/sites\/2\/2019\/11\/Spark-Streaming-Execution-Flow.jpg\" alt=\"learn the execution flow of apache spark streaming\" width=\"1200\" height=\"628\" \/><\/a><\/p>\n<h3>What is Spark Streaming Execution Flow<b><br \/>\n<\/b><\/h3>\n<p>To need of stream processing, spark streaming was launched as spark\u2019s response. It enables fast, scalable and fault-tolerant processing of live data streams. Streaming as a part of the spark environment, it is one of its major advantages over its competitors.<\/p>\n<p>Spark Streaming makes integration possible of stream processing and machine learning. Here, we can ingest data from many sources, such as Kafka, Flume, Twitter, HDFS or S3.<\/p>\n<p>Also, can be processed using high-level algorithms. Ultimately, processed data can be pushed into the various filesystems or live dashboards.<\/p>\n<p>For batch and streaming, it offers both execution and unified programming. Streaming leverages several advantages over other traditional streaming systems. Basically, there are 4 aspects of it:<\/p>\n<p>1. While it comes to failures and stragglers, it recovers very fast.<\/p>\n<p>2. It offers resource usage and better load balancing.<\/p>\n<p>3. We can combine static datasets with streaming data as well as interactive queries.<\/p>\n<p>4. It is possible to integrate it with advanced processing libraries, for example, SQL, machine learning, graph processing.<\/p>\n<h3>Spark Streaming Execution Flow &#8211; Streaming Model<\/h3>\n<p>Basically, Streaming discretize the data into tiny, micro-batches, despite processing the data one record at a time. We can also say, in this model receivers accept data in parallel. Furthermore, it buffers it into the memory of spark\u2019s worker&#8217;s nodes.<\/p>\n<p>After that spark engine runs short tasks to process the batches. Also, output the results to other systems.<\/p>\n<p><b>Note:<\/b>\u00a0The Streaming model is not <span class=\"complexword\">similar to<\/span> traditional continuous operator model. In traditional models, computation is <span class=\"adverb\">statically<\/span> allocated to a node. Moreover, it assigns tasks to the workers based on the locality of the data &amp; resources. Hence, It enables both better load balancing and faster fault recovery.<\/p>\n<p>Moreover, each batch of data is a <strong>Resilient Distributed Dataset<\/strong> (RDD). RDD is the basic abstraction of a fault-tolerant dataset in spark. It allows processing streaming data by using any spark code or library.<\/p>\n<h3>Working of Spark Streaming<\/h3>\n<p>At the starting of the process, it receives live input data streams. Afterwards, it divides the data into batches. Then those batches processed through the engine helps to generate final results in batches.<\/p>\n<p>Well, there is a high-level abstraction of spark streaming, that is DStream or discretized stream. It represents a continuous stream of data, that series then processed using Spark APIs Afterwards, results are returned in batches.<\/p>\n<p>We can create Dstream from input data streams and following sources, for example, Kafka, Flume, and Kinesis. There is one more method, by applying high-level operations on other DStreams, we can create it.<\/p>\n<p>Basically, a DStream is just a sequence of RDDs. In addition, it is possible to write streaming programs in Scala, Java or Python. A state based on data coming in a stream called stateful computations. it offers window operations.<\/p>\n<h3>Spark Streaming Sources<\/h3>\n<p>Input DStream is basically corresponding to a receiver object, that receives the data from a source and stores it in spark\u2019s memory for processing.<\/p>\n<p>Basically, Built-in streaming sources are of two types:<\/p>\n<ul>\n<li><b>Basic sources\u00a0<\/b><\/li>\n<\/ul>\n<p>Basic sources are directly available sources in the StreamingContext API, for example, file systems, and socket connections.<\/p>\n<ul>\n<li><b>Advanced sources\u00a0<\/b><\/li>\n<\/ul>\n<p>Advanced sources are available through extra utility classes, for example, Kafka, Flume, Kinesis, &amp; many more. \u00a0Also, requires linking against extra dependencies.<\/p>\n<p>While we come to reliability factor, there are two types of receivers, such as:<\/p>\n<ul>\n<li><b>Reliable Receiver<\/b><b><\/b><\/li>\n<\/ul>\n<p>This receiver sends the acknowledgment to source exactly when they received the data. In other words, which stores data with replication is a reliable receiver.<\/p>\n<ul>\n<li><b>Unreliable Receiver<\/b><b><\/b><\/li>\n<\/ul>\n<p>These receivers do not send the acknowledgment to a source. While we do not need any complexity of acknowledgment we can use these sources.<\/p>\n<h3>Conclusion<\/h3>\n<p>Hence, we have covered the complete information on spark streaming job flow. Hope this article, helps you to understand this topic better. Yet, if you feel any queries regarding, feel free to ask in the comment section.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Spark Streaming enables fast, scalable and fault-tolerant processing of live data streams. In this article, we will learn the whole concept of spark streaming execution flow. For better understanding, we will start with basics&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73300,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[614],"tags":[995,996,997],"class_list":["post-2041","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-spark","tag-spark-streaming-execution-flow","tag-spark-streaming-flow","tag-spark-streaming-job-flow"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spark Streaming Execution Flow and Streaming Model - TechVidvan<\/title>\n<meta name=\"description\" content=\"Spark Streaming Execution Flow-What is Spark Streaming Execution flow,Streaming Model,Working of Spark Streaming flow,Sources of Spark Streaming\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spark Streaming Execution Flow and Streaming Model - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Spark Streaming Execution Flow-What is Spark Streaming Execution flow,Streaming Model,Working of Spark Streaming flow,Sources of Spark Streaming\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-17T09:03:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Streaming-Execution-Flow.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spark Streaming Execution Flow and Streaming Model - TechVidvan","description":"Spark Streaming Execution Flow-What is Spark Streaming Execution flow,Streaming Model,Working of Spark Streaming flow,Sources of Spark Streaming","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/","og_locale":"en_US","og_type":"article","og_title":"Spark Streaming Execution Flow and Streaming Model - TechVidvan","og_description":"Spark Streaming Execution Flow-What is Spark Streaming Execution flow,Streaming Model,Working of Spark Streaming flow,Sources of Spark Streaming","og_url":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2018-01-17T09:03:03+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Streaming-Execution-Flow.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Spark Streaming Execution Flow and Streaming Model","datePublished":"2018-01-17T09:03:03+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/"},"wordCount":728,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Streaming-Execution-Flow.jpg","keywords":["spark streaming execution flow","spark streaming flow","spark streaming job flow"],"articleSection":["Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/","url":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/","name":"Spark Streaming Execution Flow and Streaming Model - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Streaming-Execution-Flow.jpg","datePublished":"2018-01-17T09:03:03+00:00","description":"Spark Streaming Execution Flow-What is Spark Streaming Execution flow,Streaming Model,Working of Spark Streaming flow,Sources of Spark Streaming","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Streaming-Execution-Flow.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Streaming-Execution-Flow.jpg","width":1200,"height":628,"caption":"learn the execution flow of apache spark streaming"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/spark-streaming-execution-flow\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Spark Streaming Execution Flow and Streaming Model"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2041","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2041"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2041\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73300"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2041"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2041"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2041"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}