Battle: Apache Spark vs Hadoop MapReduce
Spark vs Hadoop is a popular battle nowadays increasing the popularity of Apache Spark, is an initial point of this battle. In the big data world, Spark and Hadoop are popular Apache projects. We...
Spark vs Hadoop is a popular battle nowadays increasing the popularity of Apache Spark, is an initial point of this battle. In the big data world, Spark and Hadoop are popular Apache projects. We...
“If you only read the books that everyone else is reading, you can only think what everyone else is thinking”. So to learn apache spark efficiently, you can read best books on same. This...
Scala is a most popular language of big data it may replace java soon. To learn it, we will introduce some scala books in this blog. In today’s scenario, scala is a hot language...
This tutorial is all about Hadoop Spark Compatibility. Hadoop and Spark together build a very powerful system to address all the Big Data requirements. Spark complements Hadoop with tons of power, you can handle all...
The spark architecture has a well-defined and layered architecture. In this architecture of spark, all the components and layers are loosely coupled and its components were integrated. In this tutorial, we will discuss, abstractions...
This article is all about the Fault Tolerance in Spark property. A feature of self-recovery is one of the most powerful keys on spark platform. Which means at any stage of failure, RDD itself...
In this post, we will understand the concepts of apache spark DAG, refers to “Directed Acyclic Graph”. DAG is nothing but a graph which holds the track of operations applied on RDD. Moving ahead...
In this blog, we will capture one of the important features of RDD, Spark Lazy Evaluation. Spark RDD (Resilient Distributed Datasets), collect all the elements of data in the cluster which are partitioned. Its a...
This tutorial gives information on the main entry point to spark core i.e. Apache Spark SparkContext. Apache Spark is a powerful cluster computing engine, therefore, it is designed for fast computation of big data....
In this article, we will learn about spark RDD persistence and caching mechanism in detail. These are optimization techniques we use for spark computations. We will go through why do we need spark RDD persistence...