Category: Spark Tutorials

apache spark vs hadoop mapReduce

Battle: Apache Spark vs Hadoop MapReduce

Spark vs Hadoop is a popular battle nowadays increasing the popularity of Apache Spark, is an initial point of this battle. In the big data world, Spark and Hadoop are popular Apache projects. We...

Top Scala Books

Best Scala Books: For Beginners & Experienced

Scala is a most popular language of big data it may replace java soon. To learn it, we will introduce some scala books in this blog. In today’s scenario, scala is a hot language...

Spark Architecture & Internal Working

Spark Architecture & Internal Working

The spark architecture has a well-defined and layered architecture. In this architecture of spark, all the components and layers are loosely coupled and its components were integrated. In this tutorial, we will discuss, abstractions...

Fault Tolerance in Apache Spark

Fault Tolerance in Spark: Self recovery property

This article is all about the Fault Tolerance in Spark property. A feature of self-recovery is one of the most powerful keys on spark platform. Which means at any stage of failure, RDD itself...

Spark DAG Visualisation

Apache Spark DAG: Directed Acyclic Graph

In this post, we will understand the concepts of apache spark DAG, refers to “Directed Acyclic Graph”. DAG is nothing but a graph which holds the track of operations applied on RDD. Moving ahead...

Spark Lazy Evaluation

Apache Spark Lazy Evaluation: In Spark RDD

In this blog, we will capture one of the important features of RDD, Spark Lazy Evaluation. Spark RDD (Resilient Distributed Datasets), collect all the elements of data in the cluster which are partitioned. Its a...

RDD Persistence And Caching Mechanism

Persistence And Caching Mechanism In Apache Spark

In this article, we will learn about spark RDD persistence and caching mechanism in detail. These are optimization techniques we use for spark computations. We will go through why do we need spark RDD persistence...