{"id":2040,"date":"2018-01-19T12:05:27","date_gmt":"2018-01-19T12:05:27","guid":{"rendered":"https:\/\/techvidvan.com\/tutorials\/?p=1065"},"modified":"2018-01-19T12:05:27","modified_gmt":"2018-01-19T12:05:27","slug":"spark-design-principles","status":"publish","type":"post","link":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/","title":{"rendered":"Apache Spark Design Principles- Why Spark Matters"},"content":{"rendered":"<p>Recently, we have seen Apache Spark became a prominent player in the big data world. There is a huge spark adoption by big data companies, even at an eye-catching rate. But then always a question strikes that what are the major Apache spark design principles.<\/p>\n<p>In this blog, we will learn the whole concept of <em>principles of design in spark<\/em>. At first, We will learn, why spark matters. Furthermore, we will lift up the key parameters of the building of Apache Spark.<\/p>\n<h3>Why Spark Matters?<\/h3>\n<p>&nbsp;<\/p>\n<p>There are several reasons why Apache Spark matters, some of them are:<\/p>\n<h4>a. Spark is fast<\/h4>\n<p>In comparison to existing Hadoop, Spark can run analytics orders of <span class=\"complexword\">magnitude<\/span> faster. That is also interactive, as well as faster experimentation and provides increased productivity for analysts.<\/p>\n<h4>b. Spark is developer-friendly<\/h4>\n<p>While it comes to developers end, it is very to use as well as powerful technology. Although Spark is based on a relatively new programming language, scala. Even though developers enjoy the concise and fluid way of its programming. Moreover, Spark offers high-level API in Java, Scala, Python, and R.<\/p>\n<h4>c. In-memory processing<\/h4>\n<p>The major key feature of Spark is in-memory processing. It is the feature, what makes the technology deliver the fastest speed. It also enhances the performance of conventional big data processing.<\/p>\n<p><span class=\"complexword\">However<\/span>, this is not a new computing concept. There is a long list of a database, data-processing products with in-memory processing, for example, Redis and VoltDB.<\/p>\n<p>There are some more examples, like Apache Ignite. Spark is also equipped with in-memory processing capability. <span class=\"complexword\">In addition<\/span>, there are write-ahead logs, to address the performance of queries. Also, WAL supports <strong>ACID<\/strong> (atomicity, consistency, isolation, durability) transactions.<\/p>\n<h4>d. Spark is \u201clazy\u201d<\/h4>\n<p>In the spark operational performance, the most important underlying principle is \u201claziness\u201d. Spark does not execute the transformations until there is a request to perform an action.<\/p>\n<p>Its main advantage is, it minimizes disk and network I\/O, also enables it to perform well at scale, since it was different in MapReduce process. Despite returning the high-volume data generated by map, which <span class=\"passivevoice\">is consumed by<\/span> reducing. Spark returns the much smaller resultant data, from reducing to the driver program.<\/p>\n<h4>e. Cluster and programming language support<\/h4>\n<p>As we know Apache Spark is a distributed computing framework. Thus, as a distributed framework, it needs to meet a robust management functionality also, needs to scale out <span class=\"adverb\">horizontally<\/span>. Moreover, Spark is in demand for its effective use of CPU cores on over thousands of server nodes.<\/p>\n<p><span class=\"complexword\">In addition<\/span>, apart from the standalone, there are 2 more clusters spark supports, such as Hadoop YARN and Apache Mesos.<\/p>\n<h4>f. Spark Streaming<\/h4>\n<p><span style=\"font-size: 16px\">Basically, data streaming is a <\/span><span class=\"complexword\" style=\"font-size: 16px;text-align: right\">requirement<\/span><span style=\"font-size: 16px;text-align: right\"> on top of building an OLAP system. Here Apache Spark provides a streaming library, which offers <\/span>fault-tolerant<span style=\"font-size: 16px;text-align: right\"> distributed streaming functionality. <\/span><\/p>\n<p><span style=\"font-size: 16px;text-align: right\">Moreover, it performs streaming by treating small contiguous data chunks as <\/span>spark RDDs <span style=\"font-size: 16px;text-align: right\">sequence. Those are also Spark\u2019s core data structure.<\/span><\/p>\n<h3>Apache Spark Design Principles<\/h3>\n<p>Basically, before Spark in the industry always needed a general-purpose cluster computing tool. Since, at Hadoop, we needed many different tools to <span class=\"complexword\">satisfy<\/span> various requirements, such as:<\/p>\n<ul>\n<li>We needed Hadoop MapReduce for the purpose of batch processing.<\/li>\n<li>Apache Storm \/ S4 is used for stream processing.<\/li>\n<li>For interactive processing, we used Apache Impala \/ Apache Tez.<\/li>\n<li>We needed Neo4j \/ Apache Giraph for the purpose of graph processing.<\/li>\n<\/ul>\n<p>Therefore, there was a big demand for a powerful engine, in the industry. That can process the data in real-time (streaming) as well as in batch mode.<\/p>\n<p>Moreover, we needed an engine that can respond in sub-second and can perform in-memory processing. Hence, the major one of all Principles of design is spark is need of a unified engine.<\/p>\n<p><strong>A Unified Engine- Spark Design<\/strong><\/p>\n<p>Apache Spark leverages the advantage of higher-level libraries and includes support for SQL queries, as well as streaming data. Moreover, we can use machine learning and graph processing easily.<\/p>\n<p>Basically, these standard libraries enhance developer productivity. Ultimately, Apache Spark has fulfilled the demand for the Unified engine. That itself have several tools to run processing easily as well as with speed.<\/p>\n<p>In addition, Apache Spark <span class=\"passivevoice\">was designed<\/span> on the basis of various parameters. Spark turned as a <strong>powerful open source engine<\/strong>. It provides real-time stream processing as well as interactive processing to us.<\/p>\n<p>Also, we can use it for a graph, in-memory, and batch processing at the same time. The best part of this system is that we are using all at very fast speed <span class=\"adverb\">simultaneously<\/span>. Also, offers ease of use and standard interface to users.<\/p>\n<h3>Conclusion<\/h3>\n<p>As a result, we have seen how being a unified engine makes spark prominent among all. Hence, Apache Spark is a predominant frontrunner in the big data space now. Since it attains so many complementary features, that collective strength of the features that truly make Spark stand out from the rest.<\/p>\n<p>We hope Spark Design principle and why spark matter solves all your queries, we would like to hear feedback.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, we have seen Apache Spark became a prominent player in the big data world. There is a huge spark adoption by big data companies, even at an eye-catching rate. But then always a&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":73356,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[614],"tags":[990,991,992,993,994,675],"class_list":["post-2040","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-spark","tag-apache-spark-design-principles","tag-apache-spark-design-principles-why-spark-matters","tag-apache-spark-motivations-and-design-principles","tag-principles-of-design-in-spark","tag-why-spark","tag-why-spark-matters"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Spark Design Principles- Why Spark Matters - TechVidvan<\/title>\n<meta name=\"description\" content=\"Apache spark design principles- why spark matters,spark is fast,developer friendly,spark streaming,Spark in-memory processing,spark laziness, Spark cluster support\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Spark Design Principles- Why Spark Matters - TechVidvan\" \/>\n<meta property=\"og:description\" content=\"Apache spark design principles- why spark matters,spark is fast,developer friendly,spark streaming,Spark in-memory processing,spark laziness, Spark cluster support\" \/>\n<meta property=\"og:url\" content=\"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/\" \/>\n<meta property=\"og:site_name\" content=\"TechVidvan\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TechVidvan\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-19T12:05:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Why-Spark-Matter.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"TechVidvan Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:site\" content=\"@vidvantech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TechVidvan Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Spark Design Principles- Why Spark Matters - TechVidvan","description":"Apache spark design principles- why spark matters,spark is fast,developer friendly,spark streaming,Spark in-memory processing,spark laziness, Spark cluster support","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/","og_locale":"en_US","og_type":"article","og_title":"Apache Spark Design Principles- Why Spark Matters - TechVidvan","og_description":"Apache spark design principles- why spark matters,spark is fast,developer friendly,spark streaming,Spark in-memory processing,spark laziness, Spark cluster support","og_url":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/","og_site_name":"TechVidvan","article_publisher":"https:\/\/www.facebook.com\/TechVidvan\/","article_published_time":"2018-01-19T12:05:27+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Why-Spark-Matter.jpg","type":"image\/jpeg"}],"author":"TechVidvan Team","twitter_card":"summary_large_image","twitter_creator":"@vidvantech","twitter_site":"@vidvantech","twitter_misc":{"Written by":"TechVidvan Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#article","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/"},"author":{"name":"TechVidvan Team","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22"},"headline":"Apache Spark Design Principles- Why Spark Matters","datePublished":"2018-01-19T12:05:27+00:00","mainEntityOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/"},"wordCount":835,"commentCount":0,"publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Why-Spark-Matter.jpg","keywords":["Apache Spark design Principles","Apache Spark Design Principles: Why Spark Matters","Apache Spark: Motivations and Design Principles","principles of design in spark","Why Spark","Why Spark Matters"],"articleSection":["Spark Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/","url":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/","name":"Apache Spark Design Principles- Why Spark Matters - TechVidvan","isPartOf":{"@id":"https:\/\/techvidvan.com\/tutorials\/#website"},"primaryImageOfPage":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#primaryimage"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#primaryimage"},"thumbnailUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Why-Spark-Matter.jpg","datePublished":"2018-01-19T12:05:27+00:00","description":"Apache spark design principles- why spark matters,spark is fast,developer friendly,spark streaming,Spark in-memory processing,spark laziness, Spark cluster support","breadcrumb":{"@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#primaryimage","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Why-Spark-Matter.jpg","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2019\/11\/Why-Spark-Matter.jpg","width":1200,"height":628,"caption":"why apache spark matters"},{"@type":"BreadcrumbList","@id":"https:\/\/techvidvan.com\/tutorials\/spark-design-principles\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/techvidvan.com\/tutorials\/"},{"@type":"ListItem","position":2,"name":"Apache Spark Design Principles- Why Spark Matters"}]},{"@type":"WebSite","@id":"https:\/\/techvidvan.com\/tutorials\/#website","url":"https:\/\/techvidvan.com\/tutorials\/","name":"TechVidvan Blogs","description":"","publisher":{"@id":"https:\/\/techvidvan.com\/tutorials\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/techvidvan.com\/tutorials\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/techvidvan.com\/tutorials\/#organization","name":"TechVidvan","url":"https:\/\/techvidvan.com\/tutorials\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/","url":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","contentUrl":"https:\/\/techvidvan.com\/tutorials\/wp-content\/uploads\/2024\/03\/techvidvan-logo-200x50-1.webp","width":200,"height":50,"caption":"TechVidvan"},"image":{"@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TechVidvan\/","https:\/\/x.com\/vidvantech"]},{"@type":"Person","@id":"https:\/\/techvidvan.com\/tutorials\/#\/schema\/person\/e9c26e74dd3d87421f7ada9433b8cd22","name":"TechVidvan Team","description":"The TechVidvan Team delivers practical, beginner-friendly tutorials on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our experts are here to help you upskill and excel in today\u2019s tech industry."}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2040","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/comments?post=2040"}],"version-history":[{"count":0,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/posts\/2040\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media\/73356"}],"wp:attachment":[{"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/media?parent=2040"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/categories?post=2040"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techvidvan.com\/tutorials\/wp-json\/wp\/v2\/tags?post=2040"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}