{"id":2032,"date":"2018-01-16T11:42:01","date_gmt":"2018-01-16T11:42:01","guid":{"rendered":"https:\/\/techvidvan.com\/tutorials\/?p=839"},"modified":"2018-01-16T11:42:01","modified_gmt":"2018-01-16T11:42:01","slug":"spark-shared-variable","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/","title":{"rendered":"Spark Shared Variable- Broadcast and Accumulators"},"content":{"rendered":"<p>Basically, there is a pretty simple concept of a Spark Shared variable. In simple words, these are variables those we want to share throughout our cluster.<\/p>\n<p>In this blog, we completely focus on Shared Variable in spark, two different types of Shared Variables in spark such as Broadcast Variable and Accumulator. To understand each in detail, we will explain both with examples.<\/p>\n<h3>What is Shared Variable in Spark<\/h3>\n<p>Generally, while functions passed on, it executes on the specific remote cluster node. Usually, it works on separate copies of all the variables those we use in functions. These specific variables are <span class=\"adverb\">precisely<\/span> copied to each machine.<\/p>\n<p>Also, on the remote machine, no updates to the variables sent back to the driver program. <span class=\"complexword\">Therefore<\/span>, it would be inefficient to support general, read-write shared variables across tasks. Although, in spark\u00a0 for two common usage patterns, there are two types of shared variables, such as:<\/p>\n<ol>\n<li>\u00a0Broadcast Variables<\/li>\n<li>\u00a0Accumulators<\/li>\n<\/ol>\n<p>Now let\u2019s discuss each of them in detail:<\/p>\n<h4>1. Broadcast Variables in Spark<\/h4>\n<p>Generally, variables allow the programmers to keep a read-only variable cached on each machine. <em>Broadcast Variables d<\/em>espite shipping a copy of it with tasks.<\/p>\n<p>We can use them, for example, to give a copy of a large input dataset in an efficient manner to every node. In Spark, by using efficient algorithms it is possible to distribute broadcast variables. It helps to reduce communication cost.<\/p>\n<p>Through a set of stages, separated by distributed \u201cshuffle\u201d operations, actions execute. Spark can broadcast the common data automatically, needed by tasks within each stage. The data broadcasted this way then cached in serialized form and also deserialized before running each task.<\/p>\n<p>Hence, creating broadcast variables explicitly is useful in some cases, like while tasks across multiple stages need the same data. While caching the data in the deserialized form is important.<\/p>\n<p>We can create Spark broadcast variables from a variable v. For that, we need to call SparkContext.broadcast(v) method. This variable is a wrapper around v. Also, by calling the value method we can access its value.<\/p>\n<p><strong>For Example:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">scala&gt; val broadcastVar1 = sc.broadcast(Array(1, 2, 3))\nbroadcastVar1: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)\n\nscala&gt; broadcastVar1.value\nres0: Array[Int] = Array(1, 2, 3)<\/pre>\n<p>After we create a broadcast variable, instead of using value v in any functions we should use it. By ensuring that we can not ship v\u00a0 to the nodes more than once. It is also very important that no modification can take place on the object v after it is broadcast. It will help ensure that all nodes get the same value of the broadcast variable.<\/p>\n<h4>2. Accumulators<\/h4>\n<p>The variables which are only \u201cadded\u201d through a commutative and associative operation. Also, can efficiently support in parallel. We can use <em>Accumulators<\/em> to implement counters or sums. Spark natively supports programmers for new types and accumulators of numeric types.<\/p>\n<p>We can also create named or unnamed accumulators, as a user. As similar in below image, In the web UI, it displays a named accumulator. For each accumulator modified by a task in the \u201cTasks\u201d table Spark displays the value.<\/p>\n<p>To understand the progress of running stages, tracking accumulators in UI is useful.<\/p>\n<p>By calling SparkContext.longAccumulator(), we can create a numeric accumulator and by SparkContext.doubleAccumulator(), we can accumulate values of type long or double. Afterwards, by using the add method tasks running on a cluster can add to it.<\/p>\n<p>Nevertheless, they cannot read its value. By using its value method, only the driver program can read the accumulator value.<\/p>\n<p><strong>For Example:<\/strong><\/p>\n<p>In this code we are using an accumulator to add up the elements of an array:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">scala&gt; val accum1 = sc.longAccumulator(\"Accumulator1\")\naccum1: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, name: Some(Accumulator1), value: 0)\n\nscala&gt; sc.parallelize(Array(1, 2, 3, 4)).foreach(x =&gt; accum.add(x))\n...\n10\/09\/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s\n\nscala&gt; accum1.value\nres2: Long = 10<\/pre>\n<p>As actions performed to update, we can only update each task\u2019s to the accumulator once. In other words, restarting of tasks will not update the value. While, can apply task\u2019s update many times in transformations, if it re-executes all tasks.<\/p>\n<p>Accumulators can not change the Spark lazy evaluation model. If we try to update within an operation on a spark RDD, their value updates, as it computes RDD as part of an action. Accordingly, for accumulator updates, there is no guarantee that it executes. Since made within a lazy transformation.<\/p>\n<h3>Conclusion<\/h3>\n<p>Hence, by this article, we have seen how in Spark can discover two methods of objects sharing. Broadcast variable concerns read-only data, that can be copied before the first transformation on each executor node also, cached there and used for further computations.<\/p>\n<p>Afterwards, we have seen how accumulators help to handle shared objects. Hopefully, through this article, you have understood the concept of shared variables.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Basically, there is a pretty simple concept of a Spark Shared variable. In simple words, these are variables those we want to share throughout our cluster. In this blog, we completely focus on Shared&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73283,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[614],"tags":[923,924,925,926,927,928,929,930],"class_list":["post-2032","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-spark","tag-accumulators-variables","tag-apache-spark-shared-variables-broadcast-and-accumulators","tag-broadcast-variables","tag-explain-shared-variable-in-spark","tag-shared-variables-with-spark","tag-spark-broadcast-variables-what-are-they-and-how-do-i-use-them","tag-what-are-broadcast-variables-and-accumulators-in-apache-spark","tag-what-is-shared-varibles-in-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spark Shared Variable- Broadcast and Accumulators - TechVidvan<\/title>\n<meta name=\"description\" content=\"Spark Shared Variable- what is shared variable in spark,examples of Shared Variable,types of Shared Variables in spark: broadcast and accumulators variables\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spark Shared Variable- Broadcast and Accumulators - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Spark Shared Variable- what is shared variable in spark,examples of Shared Variable,types of Shared Variables in spark: broadcast and accumulators variables\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-16T11:42:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Shared-Variable-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spark Shared Variable- Broadcast and Accumulators - TechVidvan","description":"Spark Shared Variable- what is shared variable in spark,examples of Shared Variable,types of Shared Variables in spark: broadcast and accumulators variables","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/","og_locale":"en_US","og_type":"article","og_title":"Spark Shared Variable- Broadcast and Accumulators - TechVidvan","og_description":"Spark Shared Variable- what is shared variable in spark,examples of Shared Variable,types of Shared Variables in spark: broadcast and accumulators variables","og_url":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2018-01-16T11:42:01+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Shared-Variable-01.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Spark Shared Variable- Broadcast and Accumulators","datePublished":"2018-01-16T11:42:01+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/"},"wordCount":743,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Shared-Variable-01.jpg","keywords":["accumulators variables","Apache Spark Shared Variables : Broadcast and Accumulators","Broadcast variables","Explain shared variable in Spark","shared variables with spark","Spark Broadcast Variables - What are they and how do I use them","What are broadcast variables and accumulators in Apache Spark","what is shared varibles in spark"],"articleSection":["Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/","url":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/","name":"Spark Shared Variable- Broadcast and Accumulators - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Shared-Variable-01.jpg","datePublished":"2018-01-16T11:42:01+00:00","description":"Spark Shared Variable- what is shared variable in spark,examples of Shared Variable,types of Shared Variables in spark: broadcast and accumulators variables","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Shared-Variable-01.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Spark-Shared-Variable-01.jpg","width":1200,"height":628,"caption":"two types of spark shared variables"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/spark-shared-variable\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Spark Shared Variable- Broadcast and Accumulators"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2032","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2032"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2032\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73283"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2032"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2032"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2032"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}