Apache Spark

Spark is a framework that helps in data analytics on a distributed computing cluster. It offers in-memory computations for the faster data processing over MapReduce. It uses the Hadoop Distributed File System (HDFS) and operates on top of the current Hadoop cluster. It also processes structured data in Hive along with streaming data from various sources like HDFS, Flume, Kafka, and Twitter.

Hadoop vs Spark

Apache Spark VS Apache Hadoop


Continue reading