0% found this document useful (0 votes)
528 views16 pages

Hadoop ppt@87

The document summarizes a seminar on Hadoop. It introduces Hadoop as an open source framework written in Java that allows distributed processing of large datasets across computer clusters. It describes Hadoop's architecture, applications, advantages like scalability and fault tolerance, and disadvantages like security concerns. Examples of when to use Hadoop include when data is too large for standard tools or for large-scale extraction, transformation, and loading of data.

Uploaded by

Srichand Kalam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
528 views16 pages

Hadoop ppt@87

The document summarizes a seminar on Hadoop. It introduces Hadoop as an open source framework written in Java that allows distributed processing of large datasets across computer clusters. It describes Hadoop's architecture, applications, advantages like scalability and fault tolerance, and disadvantages like security concerns. Examples of when to use Hadoop include when data is too large for standard tools or for large-scale extraction, transformation, and loading of data.

Uploaded by

Srichand Kalam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Seminar

On
HADOOP

Seminar by:
M.SRICHAND
 Introduction
 What is Hadoop?
 Hadoop Applications
 Hadoop Architecture
 Importance
 Advantages
 Disadvantages
 When to use Hadoop?
 Reference

2
 Hadoop is an Apache open source framework
written in java that allows distributed
processing of large datasets across clusters of
computers using simple programming models.
 A Hadoop frame-worked application works in
an environment that provides distributed
storage and computation across clusters of
computers.

3
 Hadoop is sub-project of Lucene (a collection of
industrial-strength search tools), under the
umbrella of the Apache Software Foundation.
 Hadoop parallelizes data processing across
many nodes (computers) in a compute cluster,
speeding up large computations and hiding I/O
latency through increased concurrency.

4
 Making Hadoop Applications More Widely
Accessible
 A Graphical Abstraction Layer on Top of Hadoop
Applications

5
6
 Ability to store and process huge amounts of
any kind of data, quickly
 Computing power
 Fault tolerance
 Flexibility
 Low cost
 Scalability

7
 Scalable
 Cost effective
 Flexible
 Fast
 Resilient to failure

8
 Security Concerns
 Vulnerable By Nature
 Not Fit for Small Data
 Potential Stability Issues
 General Limitations

9
10
 Hadoop Common (formerly Hadoop Core)

 Hadoop MapReduce

 Hadoop YARN (MapReduce 2.0)

 Hadoop Distributed File System (HDFS)

11
12
This work was partially supported by the
SCAPE Project.
The SCAPE project is co‐funded by the
European Union under FP7 ICT‐2009.4.1
(Grant Agreement number 270137). 13
 Generally, always when “standard tools” don’t
work anymore because of sheer data size
(rule of thumb: if your data fits on a regular
hard drive, your better off sticking to
Python/SQL/Bash/etc.!)

 Aggregation across large data sets: use the


power of Reducers!

 Large-scale ETL operations (extract,


transform, load)
14
 www.google.com
 www.wikipedia.com
 www.studymafia.org
 www.projectsreports.org
Thank You
ALL

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy