0% found this document useful (0 votes)
79 views9 pages

V3i308 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views9 pages

V3i308 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Volume 3, Issue 3, March-2016, pp.

134-142 ISSN (O): 2349-7084

International Journal of Computer Engineering In Research Trends


Available online at: www.ijcert.org

Clustering and Parallel Empowering


Techniques for Hadoop File System
1
K.Naga Maha Lakshmi , 2 A.Shiva Kumar
1
Asst Professor, Keshav Memorial Institute of Technology and Science, Narayanguda, Telangana, India,
2
Asst Professor, Mahaveer Institute of Technology and Science, Bandlaguda, Telangana, India.

AbstractIn the Big Data group, Apache Hadoop and Spark are gaining prominence in handling Big Data and
analytics. Similarly MapReduce has been seen as one of the key empowering methodologies for taking care of large-
scale query processing. These middleware are traditionally written with sockets and do not deliver best performance on
datacenters with modern high performance networks. In this paper we investigate the characterizes of two file
systems that support in-memory and heterogeneous storage, and discusses the impacts of these two architectures on
the performance and fault tolerance of Hadoop MapReduce and Spark applications. We present a complete
methodology for evaluating MapReduce and Spark workloads on top of in-memory file systems and provide insights
about the interactions of different system components while running these workloads.

Keywords- Big Data, Big Data Analytics, MapReduce, Apache Spark


I.INTRODUCTION node gets fenced, there are no less than 2 different


nodes holding the same data set).
The Hadoop framework is intended to give a
dependable, shared storage and analysis base to the Hadoop verses Relational Database Systems All
client group. The storage bit of the Hadoop through the IT group, there are numerous dialogs
framework is given by a distributed file system rotating around contrasting MapReduce with
arrangement, for example, HDFS, while the analysis customary RDBMS arrangements. More or less,
usefulness is displayed by MapReduce. A few MapReduce and RDBMS systems reflects answers for
different segments are a piece of the general Hadoop totally diverse (estimated) IT handling situations and
agreement suite. The MapReduce usefulness is thus, a real correlation results into distinguishing the
planned as a device for profound data analysis and open doors and confinements of both arrangements
the change of large data sets. Hadoop empowers the in view of their particular functionalities and center
clients to investigate/dissect complex data sets by ranges. Regardless, the data sets handled by
using redid analysis scripts/commands. . In other customary database (RDBMS) arrangements are
words, by means of the redid MapReduce schedules, regularly much littler than the data pools used in a
unstructured data sets can be distributed, broke Hadoop situation (see Table 1). Consequently, unless
down, and investigated crosswise over a huge an IT base procedures TB's or PB's of unstructured
number of shared-nothing preparing data in a very parallel environment, the
systems/Clusters/nodes. Hadoop's HDFS imitates the announcement can be made that the execution of
data onto different nodes to shield nature from any Hadoop executing MapReduce inquiries will be
potential data-misfortune (to represent, if 1 Hadoop below average contrasted with SQL questions
running against a (streamlined) social database.

2016, IJCERT All Rights Reserved Page | 134


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

Hadoop uses a beast power access strategy while that uncover a gigantic parallel handling base where
RDBMS arrangements bank on streamlined getting to the data is unstructured to the point where no
schedules, for example, files, and in addition read- RDBMS streamlining systems can be connected to
ahead and compose behind procedures. Henceforth, support the execution of the inquiries.
Hadoop truly just exceeds expectations in situations

Table 1: RDBMS and MapReduce Highlights

Hadoop is essentially intended to effectively handle toughest task however is to do fast (low latency) or
expansive data volumes by connecting numerous real-time ad-hoc analytics on a complete big data set.
merchandise systems together to function as a It practically means you need to scan terabytes (or
parallel element. Commercial enterprises are utilizing even more) of data within seconds. This is only
Hadoop widely to investigate their data sets. The possible when data is processed with high
reason is that Hadoop framework depends on a parallelism. In this section we have presented how
straightforward programming model (MapReduce) actually data processed and managed on various
and it empowers a registering arrangement that is modern clusters which Substantial impact on
adaptable, adaptable, deficiency tolerant and designing and utilizing modern data management
practical. Here, the fundamental concern is to keep and processing systems in multiple tiers, the system
up velocity in handling huge datasets as far as consist of Front-end data accessing and serving
holding up time in the middle of inquiries and (Online) eg: MySql, HBase Back-end data analytics
holding up time to run the project. Flash was (Offline) eg: HDFS, MapReduce, Spark
presented by Apache Programming Establishment for
accelerating the Hadoop computational registering Big Data processing strategies break down large data
programming process. As against a typical sets at terabyte or even petabyte processing, handling
conviction, Flash is not an adjusted rendition of subjective BI use cases. While real-time stream
Hadoop and is not, generally, reliant on Hadoop on preparing is performed on the most current slice of
the grounds that it has its own particular bunch data for data profiling to pick anomalies,
administration. Hadoop is only one of the approaches misrepresentation exchange recognitions, security
to execute Flash. Spark utilizes Hadoop as a part of checking, and so on. The hardest undertaking
two ways one is storage and second is handling. however is to do quick (low latency) or ongoing ad-
Since Spark has its own group administration hoc examination on a complete big data set. It for all
calculation, it utilizes Hadoop for storage reason as it intents and purposes implies you have to output
were. terabytes (or significantly more) of data inside of
seconds. This is just conceivable when data is
II. DATA PROCESSING AND prepared with high parallelism. In this area we have
MANAGEMENT ON MODERN introduced how really data handled and oversaw on
CLUSTERS different cutting edge bunches which Generous effect
on planning and using advanced data administration
Big Data processing techniques analyze big data sets and preparing systems in various levels, the system
at terabyte or even petabyte scale. Processing, comprise of Front-end data getting to and serving
tackling arbitrary BI use cases. While real-time stream (Online) eg: MySql, HBase Back-end data
processing is performed on the most current slice of examination (Logged off) eg: HDFS, MapReduce,
data for data profiling to pick outliers, fraud Spark
transaction detections, security monitoring, etc. The

2016, IJCERT All Rights Reserved Page | 135


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

limitations of MapReduce, Spark [19] uses Resilient


Distributed Datasets (RDDs) [19] which implement
in-memory data structures used to cache intermediate
data across a set of nodes. Since RDDs can be kept in
memory, algorithms can iterate over RDD data many
times very efficiently. Although MapReduce is
designed for batch jobs, it is widely used for iterative
jobs. On the other hand, Spark has been designed
mainly for iterative jobs, but it is also used for batch
F
jobs. This is because the new big data architecture
ig 1. Data accessing and serving over internet
brings multiple frameworks together working on the
In the above fig data accessing and serving over same data, which is already stored in HDFS [17]. We
internet where Front end tier has web server which choose to compare these two frameworks due to their
process through web server as MySql Queries or wide spread adoption in big data analytics. All the
NoSql Queries later as Back-end tier data analytics major Hadoop vendors such as IBM, Cloudera,
will be done on MapReduce or Spark over HDFS . Horton works, and MapR bundle both MapReduce
and Spark with their Hadoop distributions
2.1. Spark - An Example of Back-end Data
Processing Middleware III.APACHE SPARK

An in-memory data-processing framework which Spark is another execution system. Like MapReduce,
performs Iterative machine learning jobs, Interactive it works with the file system to disperse your
data analytics. Scalable, communication and I/O information over the group, and process that
intensive which performs Wide dependencies information in parallel. Like MapReduce, Apache
between Resilient Distributed Datasets (RDDs) Spark is a fast and general engine for large-scale data
MapReduce-like shuffle operations to repartition processing. It is based on Hadoop MapReduce and it
RDDs Sockets based communication extends the MapReduce model to efficiently use it for
more types of computations, which includes
2.2. Cluster computing frameworks interactive queries and stream processing. The main
feature of Spark is its in-memory cluster computing
MapReduce is one of the earliest and best known that increases the processing speed of an application.
commodity cluster frameworks. MapReduce follows Spark is designed to cover a wide range of workloads
the functional programming model [8], and performs such as batch applications, iterative algorithms,
explicit synchronization across computational stages. interactive queries and streaming. Apart from
MapReduce exposes a simple programming API in supporting all these workload in a respective system,
terms of map () and reduce () functions. Apache it reduces the management burden of maintaining
Hadoop [1] is a widely used open source separate tools.
implementation of MapReduce. The simplicity of
MapReduce is attractive for users, but the framework 3.1. Features of Apache Spark
has several limitations. Applications such as machine
learning and graph analytics iteratively process the 3.1.1. Speed
data, which means multiple rounds of computation
are performed on the same data. In MapReduce, Run programs up to 100x faster than Hadoop
every job reads its input data, processes it, and then MapReduce in memory, or 10x faster on disk. Spark
writes it back to HDFS. For the next job to consume has an advanced DAG execution engine that supports
the output of a previously run job, it has to repeat the cyclic data flow and in-memory computing.
read, process, and write cycle. For iterative
algorithms, which want to read once, and iterate over
the data many times, the MapReduce model poses a
significant overhead. To overcome the above

2016, IJCERT All Rights Reserved Page | 136


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

stack. It allows other components to run on top of


stack.

3.2.3. Spark in MapReduce (SIMR): Spark in


MapReduce is used to launch spark job in addition to
standalone deployment. With SIMR, user can start
Spark and uses its shell without any administrative
Fig2. Logistic regression in Hadoop and Spark
access.

3.1.2. Ease of Use (Supports multiple languages) 3.3. Components of Spark


Write applications quickly in Java, Scala, Python,
R.Spark offers over 80 high-level operators that make
it easy to build parallel apps. And you can use
it interactively from the Scala, Python and R shells.

3.1.3. Advanced Analytics: Generality

Spark not only supports Map and reduce. It also


supports SQL queries, Streaming data, Machine Fig 4. Illustration depicts the different components of
learning (ML), and Graph algorithms. Combine SQL, Spark.
streaming, and complex analytics. Spark powers a
stack of libraries including SQL and Data 3.3.1. Apache Spark Core Spark Core is the
Frames, MLlib for machine learning, GraphX, underlying general execution engine for spark
and Spark Streaming. You can combine these libraries platform that all other functionality is built upon. It
seamlessly in the same application. provides In-Memory computing and referencing
datasets in external storage systems.
3.2. Spark Built on Hadoop:

3.3.2. Spark SQL Spark SQL is a component on top


of Spark Core that introduces a new data abstraction
called SchemaRDD, which provides support for
structured and semi-structured data.

3.3.3.Spark Streaming Spark Streaming leverages


Spark Core's fast scheduling capability to perform
streaming analytics. It ingests data in mini-batches
and performs RDD (Resilient Distributed Datasets)
transformations on those mini-batches of data.
Fig 3. Spark built with Hadoop components.

3.2.1. Standalone: Spark Standalone deployment 3.3.4.MLlib (Machine Learning Library) MLlib is a
means Spark occupies the place on top of HDFS distributed machine learning framework above Spark
(Hadoop Distributed File System) and space is because of the distributed memory-based Spark
allocated for HDFS, explicitly. Here, Spark and architecture. It is, according to benchmarks, done by
MapReduce will run side by side to cover all spark the MLlib developers against the Alternating Least
jobs on cluster. Squares (ALS) implementations. Spark MLlib is nine
times as fast as the Hadoop disk-based version of
3.2.2. Hadoop Yarn: Hadoop Yarn deployment Apache Mahout (before Mahout gained a Spark
means, simply, spark runs on Yarn without any pre- interface).
installation or root access required. It helps to
integrate Spark into Hadoop ecosystem or Hadoop

2016, IJCERT All Rights Reserved Page | 137


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

3.3.5.GraphX GraphX is a distributed graph- HDFS follows the master-slave architecture and it has
processing framework on top of Spark. It provides an the following elements.
API for expressing graph computation that can model
the user-defined graphs by using Pregel abstraction Namenode:The namenode is the commodity
API. It also provides an optimized runtime for this hardware that contains the GNU/Linux operating
abstraction. system and the namenode software. It is software
that can be run on commodity hardware. The system
3.4. Important of Resilient Distributed having the namenode acts as the master server and it
Datasets (RDD) in Apache Spark does the following tasks: 1.Manages the file system
namespace. 2. Regulates clients access to files. It also
executes file system operations such as renaming,
Resilient Distributed Datasets (RDD) is a
closing, and opening files and directories.
fundamental data structure of Spark. It is an
immutable distributed collection of objects. Each Datanode: The datanode is a commodity hardware
dataset in RDD is divided into logical partitions, having the GNU/Linux operating system and
which may be computed on different nodes of the datanode software. For every node (Commodity
cluster. RDDs can contain any type of Python, Java, or hardware/System) in a cluster, there will be a
Scala objects, including user-defined classes. datanode. These nodes manage the data storage of
Formally, an RDD is a read-only, partitioned their system.
collection of records. RDDs can be created through
deterministic operations on either data on stable Datanodes perform read-write operations on
storage or other RDDs. RDD is a fault-tolerant the file systems, as per client request.
collection of elements that can be operated on in They also perform operations such as block
parallel. There are two ways to create RDDs: creation, deletion, and replication according
parallelizing an existing collection in your driver to the instructions of the namenode.
program, or referencing a dataset in an external
Block: Generally the user data is stored in the files
storage system, such as a shared file system, HDFS,
of HDFS. The file in a file system will be divided into
HBase, or any data source offering a Hadoop Input
one or more segments and/or stored in individual
Format. Spark makes use of the concept of RDD to
data nodes. These file segments are called as blocks.
achieve faster and efficient MapReduce operations.
In other words, the minimum amount of data that
Let us first discuss how MapReduce operations take
HDFS can read or write is called a Block. The default
place and why they are not so efficient.
block size is 64MB, but it can be increased as per the
need to change in HDFS configuration.
3.5. HDFS Architecture
3.6. Goals of HDFS
Given below is the architecture of a Hadoop File Fault detection and recovery: Since HDFS
System. includes a large number of commodity hardware,
failure of components is frequent. Therefore HDFS
should have mechanisms for quick and automatic
fault detection and recovery.

Huge datasets: HDFS should have hundreds of


nodes per cluster to manage the applications having
huge datasets.

Hardware at data: A requested task can be done


efficiently, when the computation takes place near
the data. Especially where huge datasets are
involved, it reduces the network traffic and increases
Fig 5. Architecture of a Hadoop File System the throughput.

2016, IJCERT All Rights Reserved Page | 138


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

IV.MAPREDUCE OVERVIEW represents the movement from information key-value


pairs to yield key-value pairs at an abnormal state
Apache Hadoop MapReduce is a framework for
preparing large data sets in parallel over a Hadoop :
cluster. Data analysis utilizes a two stage outline
lessen process. The employment setup supplies
outline diminishes analysis capacities and the
Hadoop system gives the planning, appropriation,
and parallelization administrations. The top level unit
Though each set of key-value pairs is homogeneous,
of work in MapReduce is a vocation. An occupation
the key-value pairs in each step need not have the
as a rule has a guide and a diminish stage, however
same type. For example, the key-value pairs in the
the lessen stage can be overlooked. For instance,
input set (KV1) can be (string, string) pairs, with the
consider a MapReduce work that checks the quantity
map phase producing (string, integer) pairs as
of times every word is utilized over an arrangement
intermediate results (KV2), and the reduce phase
of archives. The guide stage include the words every
producing (integer, string) pairs for the final results
record, then the diminish stage totals the per-archive
(KV3). See Example: Calculating Word Occurrences.
data into word checks spreading over the whole
accumulation.
The keys in the map output pairs need not be unique.
Between the map processing and the reduce
Amid the guide stage, the info data is separated into
processing, a shuffle step sorts all map output values
information parts for analysis by guide assignments
with the same key into a single reduce input (key,
running in parallel over the Hadoop cluster. Of
value-list) pair, where the 'value' is a list of all values
course, the MapReduce structure gets information
sharing the same key. Thus, the input to a reduce task
data from the Hadoop Appropriated Record
is actually a set of (key, value-list) pairs.
Framework (HDFS). Utilizing the MarkLogic
Connector for Hadoop empowers the system to get
The key and value types at each stage determine the
info data from a MarkLogic Server instance. For
interfaces to your map and reduce functions.
subtle elements, see Map task.
Therefore, before coding a job, determine the data
types needed at each stage in the map-reduce process.
The reduce phase uses results from map tasks as
For example:
input to a set of parallel reduce tasks. The reduce
tasks consolidate the data into final results. By
1. Choose the reduce output key and value
default, the MapReduce framework stores results in
types that best represents the desired
HDFS. Using the MarkLogic Connector for Hadoop
outcome.
enables the framework to store results in a MarkLogic
Server instance. For details, see Reduce Task.
2. Choose the map input key and value types
best suited to represent the input data from
In spite of the fact that the lessen stage relies on upon
which to derive the final result.
yield from the guide stage, outline decrease
preparing is not inexorably successive. That is,
3. Determine the transformation necessary to
decrease undertakings can start when any guide
get from the map input to the reduce output,
errand finishes. It is redundant for all guide
and choose the intermediate map
undertakings to finish before any lessen errand can
output/reduce input key value type to match.
start. MapReduce works on key-value pairs. Adroitly,
a MapReduce work takes an arrangement of info key- 4. What is MapReduce? It is a part of the
value pairs and creates an arrangement of yield key- Hadoop framework that is responsible for
value pairs by going the data through guide and processing large data sets with a parallel and
lessens capacities. The guide assignments create a distributed algorithm on a cluster. As the
middle of the road set of key-value pairs that the name suggests, the MapReduce algorithm
lessen errands utilizes as data. The chart underneath contains two important tasks: Map and

2016, IJCERT All Rights Reserved Page | 139


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

Reduce. Map takes a set of data and converts Spark claims to process data 100x faster than
it into another set of data, where individual MapReduce, while 10x faster with the disks.
elements are broken down into tuples
(key/value pairs). On the other hand, Reduce
takes the output from a map as an input and
combines the data tuples into smaller set of
tuples. In MapReduce, the data is distributed
over the cluster and processed.

5. The difference in Spark is that it performs in-


memory processing of data. This in-memory
processing is a faster process as there is no
time spent in moving the data/processes in
and out of the disk, whereas MapReduce How Spark is compatible with Apache
requires a lot of time to perform these hadoop?
input/output operations thereby increasing Spark can keep running on Hadoop 2's YARN
latency. bunch supervisor,
Spark can read any current Hadoop information.
Hive, Pig and Mahout can keep running on
Spark.
Advanced Big Data Analytics is perplexing.
Hadoop MapReduce (MR) works really well in
the event that you can express your issue as a
solitary MR work. Practically speaking, most
issues don't fit flawlessly into a solitary MR work.
MR is moderate for cutting edge Enormous
Information Examination, for example, iterative
preparing and storing of datasets which are
appropriate to machine learning. Spark enhances
MapReduce by uprooting the need to compose
information to plate between steps.
Need to coordinate numerous different
Fig 6. Operations of Mapreduce apparatuses for cutting edge Big Data Analytics
for Questions, Gushing Examination, Machine
5.1. Major difference among MapReduce and Learning and Chart
Spark
5.2. Key capabilities of Spark
Real-Time Big Data Analysis:
Real-time data analysis means processing data Fast parallel data processing as intermediate data
generated by the real-time event streams coming in at is stored in memory and only persisted to disc if
the rate of millions of events per second, Twitter data needed.
for instance. The strength of Spark lies in its abilities Apache Spark provides higher level
to support streaming of data along with distributed abstraction and generalization of MapReduce.
processing. This is a useful combination that delivers Over 80 high level built-in operators. Besides
near real-time processing of data. MapReduce is MapReduce, Spark supports SQL
handicapped of such an advantage as it was designed queries, streaming data, and complex analytics
to perform batch cum distributed processing on large such as machine learning and graph
amounts of data. Real-time data can still be processed algorithms out-of-the-box.
on MapReduce but its speed is nowhere close to that MapReduce is limited to batch processing. Spark
of Spark. lets you write streaming and batch-mode

2016, IJCERT All Rights Reserved Page | 140


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

applications with very similar logic and


APIs instead of using different tools.
Apache Spark provides a set of composable
building blocks for writing
in Scala, Java or Python,concise queries and data
flows. This makes developers highly productive.
It is possible to build a single data workflow that
leverages, streaming, batch, sql and machine
learning for example.
Spark runs on Hadoop, Mesos, standalone, or in
the cloud. It can access diverse data sources
Figure 7: Iterative operations on Spark RDD
including HDFS, Cassandra, HBase, S3.

If different queries are run on the same set of data


VI.PERFORMANCE repeatedly, this particular data can be kept in
memory for better execution times
Iterative Machine Learning Algorithms:

Almost all machine learning algorithms work


iteratively. As we have seen earlier, iterative
algorithms involve I/O bottlenecks in the
MapReduce implementations. MapReduce uses
coarse-grained tasks (task-level parallelism) that
are too heavy for iterative algorithms. Spark with
the help of Mesos a distributed system kernel,
caches the intermediate dataset after each
iteration and runs multiple iterations on this
cached dataset which reduces the I/O and helps
to run the algorithm faster in a fault tolerant Figure 8: Interactive operations on Spark RDD
manner.
Spark has a built-in scalable machine learning In a Big Data connection, each of these cycles can be
library called MLlib which contains high-quality exceptionally burdensome, with every test cycle, for
algorithms that leverages iterations and yields instance, being hours long. While there are different
better results than one pass approximations routes systems to mitigate this issue, one of the best is
sometimes used on MapReduce. to just run your project quick. On account of the
Iterative Operations on Spark RDD execution advantages of Spark, the improvement
lifecycle can be really abbreviated simply because of
the way that the test/investigate cycles are much
The illustration given below shows the iterative
shorter.
operations on Spark RDD. It will store intermediate
results in a distributed memory instead of Stable
storage (Disk) and make the system faster. Note: If
the Distributed memory (RAM) is sufficient to store
intermediate results (State of the JOB), then it will
store those results on the disk.

Figure 9.Performance of Spark

2016, IJCERT All Rights Reserved Page | 141


K.Naga Maha Lakshmi et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 3, March-2016, pp. 134-142

Here are results from a survey taken on Spark by growing demand for Spark.
Typesafe to better understand the trends and

Fig 10. Apache Spark survey Report


VII. CONCLUSIONS
[5] F. Li, B. C. Ooi, M. T. zsu and S. Wu, "Distributed
In this paper we looked into the MapReduce data management using MapReduce," ACM
hindrance in meeting the reliably extending enlisting Computing Surveys, 46(3), pp. 1-42, 2014.
solicitations of Big Data. In this papers we focused on
substitution of MapReduce technique, one of the key [6] C. Doulkeridis and K. Nrvg, "A survey of large-
engaging approaches for dealing with Big Data asks scale analytical query processing in MapReduce," The
for by strategy for significantly parallel get ready on VLDB Journal, pp. 1-26, 2013.
innumerable nodes. Issues and troubles MapReduce
faces while overseeing Tremendous Data are leads
[7] P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar
with Apache Spark it has practically identical levels of
and R. Pasquin, "Incoop: MapReduce for incremental
security, sensibility, and flexibility as MapReduce, and
computations," Proc. of the 2nd ACM Symposium on
should be correspondingly joined with the straggling
Cloud Computing, 2011.
leftovers of the advancements that incorporate the
ceaselessly developing Hadoop platforms.

REFERENCES:

[1] P. Zadrozny and R. Kodali, Big Data Analytics


using Splunk, Berkeley, CA, USA: Apress, 2013.

[2] F. Ohlhorst, Big Data Analytics: Turning Big Data


into Big Money, Hoboken, N.J, USA: Wiley, 2013.

[3] J. Dean and S. Ghemawat, "MapReduce: Simplified


data processing on large clusters," Commun ACM,
51(1), pp. 107-113, 2008.

[4] Apache Hadoop, http://hadoop.apache.org.

2016, IJCERT All Rights Reserved Page | 142

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy