0% found this document useful (0 votes)
56 views4 pages

Spark

Uploaded by

manasapalireddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views4 pages

Spark

Uploaded by

manasapalireddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 4

What is Spark?

Apache Spark is an open-source, distributed processing


system used for big data workloads. It utilizes in-memory
caching and optimized query execution for fast queries
against data of any size. Simply put, Spark is a fast and
general engine for large-scale data processing.
The fast part means that it’s faster than previous
approaches to work with Big Data like
classical MapReduce. The secret for being faster is that
Spark runs on memory (RAM), and that makes the
processing much faster than on disk drives.
The general part means that it can be used for multiple
things like running distributed SQL, creating data
pipelines, ingesting data into a database, running
Machine Learning algorithms, working with graphs or
data streams, and much more.
• Components
• Apache Spark Core – Spark Core is the underlying general execution engine for
the Spark platform that all other functionality is built upon. It provides in-
memory computing and referencing datasets in external storage systems.
• Spark SQL – Spark SQL is Apache Spark’s module for working with structured
data. The interfaces offered by Spark SQL provides Spark with more information
about the structure of both the data and the computation being performed.
• Spark Streaming – This component allows Spark to process real-time streaming
data. Data can be ingested from many sources like Kafka, Flume, and HDFS
(Hadoop Distributed File System). Then the data can be processed using
complex algorithms and pushed out to file systems, databases, and live
dashboards.
• MLlib (Machine Learning Library) – Apache Spark is equipped with a rich library
known as MLlib. This library contains a wide array of machine learning
algorithms- classification, regression, clustering, and collaborative filtering. It
also includes other tools for constructing, evaluating, and tuning ML Pipelines.
All these functionalities help Spark scale out across a cluster.
• GraphX – Spark also comes with a library to manipulate graph databases and
perform computations called GraphX. GraphX unifies ETL (Extract, Transform,
and Load) process, exploratory analysis, and iterative graph computation within
a single system.
• Features
• Fast processing – The most important feature of Apache Spark that has made
the big data world choose this technology over others is its speed. Big data is
characterized by volume, variety, velocity, and veracity which needs to be
processed at a higher speed. Spark contains
Resilient Distributed Dataset (RDD) which saves time in reading and writing
operations, allowing it to run almost ten to one hundred times faster than
Hadoop.
• Flexibility – Apache Spark supports multiple languages and allows the
developers to write applications in Java, Scala, R, or Python.
• In-memory computing – Spark stores the data in the RAM of servers which
allows quick access and in turn accelerates the speed of analytics.
• Real-time processing – Spark is able to process real-time streaming data.
Unlike MapReduce which processes only stored data, Spark is able to process
real-time data and is, therefore, able to produce instant outcomes.
• Better analytics – In contrast to MapReduce that includes Map and Reduce
functions, Spark includes much more than that. Apache Spark consists of a
rich set of SQL queries, machine learning algorithms, complex analytics, etc.
With all these functionalities, analytics can be performed in a better fashion
with the help of Spark.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy