0% found this document useful (0 votes)

87 views4 pages

What Is Hadoop

Hadoop is an open source software platform that addresses the problems of storing and analyzing large volumes of data. It uses HDFS for inexpensive and reliable data storage by breaking files into blocks and storing multiple redundant copies across clusters of commodity servers. Hadoop also uses MapReduce to analyze both structured and unstructured data by distributing the work across nodes and processing data in parallel, leveraging HDFS's data distribution to improve performance. These techniques allow Hadoop to cost-effectively manage exabytes of data and gain insights that help control processes, predict demand, and build better products and services.

Uploaded by

krishnanand

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views4 pages

What Is Hadoop

Uploaded by

krishnanand

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

WhatIsHadoop?

ManagingBigDataintheEnterprise
Introduction
Datavolumesare growingmuchfaster thancomputepower. Thisgrowthdemands newstrategiesfor processingand analyzinginformation. AccordingtoIDC1,theamountdigitalinformationproducedin 2011willbetentimesthatproducedin2006:1,800exabytes. Themajorityofthisdatawillbeunstructuredcomplexdata poorlysuitedtomanagementbystructuredstoragesystems likerelationaldatabases. Unstructureddatacomesfrommanysourcesandtakesmany formsweblogs,textfiles,sensorreadings,usergenerated contentlikeproductreviewsortextmessages,audio,videoand stillimageryandmore. Largevolumesofcomplexdatacanhideimportantinsights. Aretherebuyingpatternsinpointofsaledatathatcan forecastdemandforproductsatparticularstores?DoRFIDtag readsshowanomaliesinthemovementofgoodsduring distribution?Douserlogsfromawebsite,orcallingrecordsin amobilenetwork,containinformationaboutrelationships amongindividualcustomers?Canacollectionofnucleotide sequencesbeassembledintoasinglegene?Companiesthat canextractfactslikethesefromthehugevolumeofdatacan bettercontrolprocessesandcosts,canbetterpredictdemand andcanbuildbetterproducts. Dealingwithbigdatarequirestwothings: Inexpensive,reliablestorage;and Newtoolsforanalyzingunstructuredandstructured data.

ApacheHadoopisapowerfulopensourcesoftwareplatform thataddressesbothoftheseproblems.HadoopisanApache SoftwareFoundationproject.Clouderaofferscommercial supportandservicestoHadoopusers. 1AnUpdatedForecastofWorldwideInformationGrowth Through2011,IDC,March2008. WhatisHadoop?BigDataintheEnterprise 1

ReliableStorage:HDFS
MajorInternet propertieslikeGoogle, Amazon,Facebookand Yahoo!havepioneered theuseofnetworksof inexpensivecomputers forlargescaledata storageand processing.HDFSuses thesetechniquesto storeenterprisedata. Hadoopincludesafaulttolerantstoragesystemcalledthe HadoopDistributedFileSystem,orHDFS.HDFSisabletostore hugeamountsofinformation,scaleupincrementallyand survivethefailureofsignificantpartsofthestorage infrastructurewithoutlosingdata. Hadoopcreatesclustersofmachinesandcoordinateswork amongthem.Clusterscanbebuiltwithinexpensivecomputers. Ifonefails,Hadoopcontinuestooperatetheclusterwithout losingdataorinterruptingwork,byshiftingworktothe remainingmachinesinthecluster. HDFSmanagesstorageontheclusterbybreakingincoming filesintopieces,calledblocks,andstoringeachoftheblocks redundantlyacrossthepoolofservers.Inthecommoncase, HDFSstoresthreecompletecopiesofeachfilebycopyingeach piecetothreedifferentservers: 2
4

1
2 5

1 2 3 4 5

HDFS

3 4

2
3 4

1
3 5

Figure1:HDFSdistributesfileblocksamongservers HDFShasseveralusefulfeatures.Intheverysimpleexample shown,anytwoserverscanfail,andtheentirefilewillstillbe available.HDFSnoticeswhenablockoranodeislost,and createsanewcopyofmissingdatafromthereplicasit

WhatisHadoop?BigDataintheEnterprise

manages.Becausetheclusterstoresseveralcopiesofevery block,moreclientscanreadthematthesametimewithout creatingbottlenecks. Otherfaulttolerant storagesystemsare oftenmoreexpensive thanHDFS. Ofcoursetherearemanyotherredundancytechniques, includingthevariousstrategiesemployedbyRAIDmachines. HDFSofferstwokeyadvantagesoverRAID:Itrequiresno specialhardware,sinceitcanbebuiltfromcommodityservers, andcansurvivemorekindsoffailureadisk,anodeonthe networkoranetworkinterface. TheoneobviousobjectiontoHDFSitsconsumptionofthree timesthenecessarystoragespaceforthefilesitmanagesis notsoserious,giventheplummetingcostofstorage.In addition,HDFSofferssomerealadvantagesfordata processing,asthenextsectionwillshow.

HadoopforBigDataAnalysis
Manypopulartoolsforenterprisedatamanagement relationaldatabasesystems,forexamplearedesignedto makesimplequeriesrunquickly.Theyusetechniqueslike indexingtoexaminejustasmallportionofalltheavailable datainordertoansweraquestion. Hadoopisdesignedfor largescaleanalyses thatneedtoexamine allthedataina repository. Hadoopisadifferentsortoftool.Hadoopisaimedatproblems thatrequireexaminationofalltheavailabledata.Forexample, textanalysisandimageprocessinggenerallyrequirethatevery singlerecordberead,andofteninterpretedinthecontextof similarrecords.HadoopusesatechniquecalledMapReduceto carryoutthisexhaustiveanalysisquickly. Intheprevioussection,wesawthatHDFSdistributesblocks fromasinglefileamongalargenumberofserversfor reliability.Hadooptakesadvantageofthisdatadistributionby pushingtheworkinvolvedinananalysisouttomanydifferent servers.Eachoftheserversrunstheanalysisonitsownblock fromthefile.Resultsarecollatedanddigestedintoasingle resultaftereachpiecehasbeenanalyzed.

WhatisHadoop?BigDataintheEnterprise

Hadooptakes advantageofHDFS datadistribution strategytopushwork outtomanynodesina cluster.Thisallows analysestorunin parallelandeliminates thebottlenecks imposedbymonolithic storagesystems.

2 4 5 1 3 4 2 3 4

1 2 5

1 3 5

Figure2:Hadooppushesworkouttothedata Runningtheanalysisonthenodesthatactuallystorethedata deliversmuchmuchbetterperformancethanreadingdata overthenetworkfromasinglecentralizedserver.Hadoop monitorsjobsduringexecution,andwillrestartworklostdue tonodefailureifnecessary.Infact,ifaparticularnodeis runningveryslowly,Hadoopwillrestartitsworkonanother serverwithacopyofthedata.

Summary
HadoopsMapReduceandHDFSusesimple,robusttechniques oninexpensivecomputersystemstodeliververyhighdata availabilityandtoanalyzeenormousamountsofinformation quickly.Hadoopoffersenterprisesapowerfulnewtoolfor managingbigdata. Formoreinformation,pleasecontactClouderaat: info@cloudera.com +16503620488 http://www.cloudera.com/

WhatisHadoop?BigDataintheEnterprise

Bda Unit 1 - Mam
No ratings yet
Bda Unit 1 - Mam
198 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
61 pages
Big Data
No ratings yet
Big Data
79 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
58 pages
Official SBI Specialist Officers Eligibility & Recruitment Notification 2016
No ratings yet
Official SBI Specialist Officers Eligibility & Recruitment Notification 2016
6 pages
Beejs Guide 2001
No ratings yet
Beejs Guide 2001
47 pages
Hadoop 1
No ratings yet
Hadoop 1
109 pages
Bigdata and Hadoop
No ratings yet
Bigdata and Hadoop
33 pages
Big Data Harnessing The Power of Hadoop
No ratings yet
Big Data Harnessing The Power of Hadoop
9 pages
Unit 1
No ratings yet
Unit 1
89 pages
Intro
No ratings yet
Intro
47 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
01 Hadoop Overview
No ratings yet
01 Hadoop Overview
27 pages
Introduction To Hadoop Slides
No ratings yet
Introduction To Hadoop Slides
111 pages
Hadoop
No ratings yet
Hadoop
562 pages
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Big Data-2
No ratings yet
Big Data-2
40 pages
BigData Session1
No ratings yet
BigData Session1
14 pages
DXT Plug-In Unit Descriptions: DN0420278 Issue 1-3
No ratings yet
DXT Plug-In Unit Descriptions: DN0420278 Issue 1-3
36 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Taming Big Data
No ratings yet
Taming Big Data
268 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Big Data
No ratings yet
Big Data
17 pages
Big Data Presentation
No ratings yet
Big Data Presentation
22 pages
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
LG 60up7750 Uhd 4k Smart TV 60
No ratings yet
LG 60up7750 Uhd 4k Smart TV 60
24 pages
Cassandra Succinctly PDF
No ratings yet
Cassandra Succinctly PDF
121 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
Service Manual: PDP Television
No ratings yet
Service Manual: PDP Television
62 pages
M.SC - Bioinformatics
No ratings yet
M.SC - Bioinformatics
43 pages
System Identification With Matlab. Linear Models
No ratings yet
System Identification With Matlab. Linear Models
267 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
2.camera NEZ-5230-IRCW4 - Bosch
No ratings yet
2.camera NEZ-5230-IRCW4 - Bosch
9 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
01 - Intro To Big Data
No ratings yet
01 - Intro To Big Data
26 pages
Introduction To Analytics and Big Data - Hadoop: Thomas Rivera Hitachi Data Systems
No ratings yet
Introduction To Analytics and Big Data - Hadoop: Thomas Rivera Hitachi Data Systems
45 pages
Urn 100243
No ratings yet
Urn 100243
88 pages
Nortel GSM OMC-R Fundamentals: Global System For Mobile Communications (GSM)
No ratings yet
Nortel GSM OMC-R Fundamentals: Global System For Mobile Communications (GSM)
258 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Big Data
No ratings yet
Big Data
33 pages
Big Data Hadoop Training 8214944.ppsx
No ratings yet
Big Data Hadoop Training 8214944.ppsx
52 pages
BI and Big Data Management
From Everand
BI and Big Data Management
Ulrich Hambuch
No ratings yet
KFT00 - CH1-System Description Spec 12
No ratings yet
KFT00 - CH1-System Description Spec 12
108 pages
Hadoop Release 2.0
No ratings yet
Hadoop Release 2.0
54 pages
SmartStream Designer Configurators For Enfocus Switch - CA394-13690
No ratings yet
SmartStream Designer Configurators For Enfocus Switch - CA394-13690
4 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Chapter 06 Test Records For Retail Banking PDF
100% (1)
Chapter 06 Test Records For Retail Banking PDF
11 pages
National Spatial Data Infrastructure: Concepts and Components
No ratings yet
National Spatial Data Infrastructure: Concepts and Components
46 pages
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
Cloud Era Csu La 11122012
No ratings yet
Cloud Era Csu La 11122012
50 pages
Big Data With Hadoop - For Data Management, Processing and Storing
No ratings yet
Big Data With Hadoop - For Data Management, Processing and Storing
7 pages
Unit 3 Introduction To Hadoop Syllabus
No ratings yet
Unit 3 Introduction To Hadoop Syllabus
22 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Optimizing UTorrent
No ratings yet
Optimizing UTorrent
33 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Big Data Network
No ratings yet
Big Data Network
33 pages
Cloudera Hadoop Introduction PDF
100% (1)
Cloudera Hadoop Introduction PDF
50 pages
Deutsche Telekom Perspective On HADOOP and Big Data Technologies
No ratings yet
Deutsche Telekom Perspective On HADOOP and Big Data Technologies
19 pages
Ebook Hadoop
No ratings yet
Ebook Hadoop
20 pages
Analyzing Bigdata With Hadoop Cluster in Hdinsight Azure Cloud
No ratings yet
Analyzing Bigdata With Hadoop Cluster in Hdinsight Azure Cloud
5 pages
DM Lite - Hdmi Over Catx Receiver, Surface Mount: Hd-Rx-101-C-E
No ratings yet
DM Lite - Hdmi Over Catx Receiver, Surface Mount: Hd-Rx-101-C-E
3 pages
UBRP201617 ITSpecilaist Officer Recruitment Notification
No ratings yet
UBRP201617 ITSpecilaist Officer Recruitment Notification
17 pages
RRB Solved Paper - 2016: Based On Memory
No ratings yet
RRB Solved Paper - 2016: Based On Memory
21 pages
Capacity and Volte: Impact in Cell Throughput Capacity and in Overall System Capacity Impact in Non-Gbr Traffic
100% (1)
Capacity and Volte: Impact in Cell Throughput Capacity and in Overall System Capacity Impact in Non-Gbr Traffic
5 pages
Hadoop Module 3.2
100% (1)
Hadoop Module 3.2
57 pages
List of Protocols
No ratings yet
List of Protocols
4 pages
Installing A USB Serial Adapter On Mac OS X - Plugable
No ratings yet
Installing A USB Serial Adapter On Mac OS X - Plugable
21 pages
Seminar Report On Bigdata and Hadoop
No ratings yet
Seminar Report On Bigdata and Hadoop
4 pages
WPS Flaw Vulnerable Devices
No ratings yet
WPS Flaw Vulnerable Devices
20 pages
Calculating An 802.1d Spanning-Tree Topology Whitepaper
100% (1)
Calculating An 802.1d Spanning-Tree Topology Whitepaper
19 pages
Big Data: Opportunities and challenges
From Everand
Big Data: Opportunities and challenges
BCS, The Chartered Institute for IT
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
A Comparative Study On OCR Tools
No ratings yet
A Comparative Study On OCR Tools
9 pages
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
From Everand
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
Eileen McNulty-Holmes
4/5 (5)
VRF-Aware IPsec Using Crypto Maps and Custom FVRF
No ratings yet
VRF-Aware IPsec Using Crypto Maps and Custom FVRF
4 pages
Gango Week 4 Lect 1
No ratings yet
Gango Week 4 Lect 1
8 pages
Section: COMP - 101 /final Makeup Exam
No ratings yet
Section: COMP - 101 /final Makeup Exam
3 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
Jncie Ent Sample
100% (1)
Jncie Ent Sample
26 pages
Deploy A SAPUI5 Mobile App To Android Device PDF
No ratings yet
Deploy A SAPUI5 Mobile App To Android Device PDF
13 pages
AMR Settings Huawei
No ratings yet
AMR Settings Huawei
19 pages
What Is Bigdata
No ratings yet
What Is Bigdata
5 pages
Common Logging IN TIBCO
100% (1)
Common Logging IN TIBCO
3 pages
Hadoop Chapter 1
No ratings yet
Hadoop Chapter 1
6 pages
Akash Humar Panigrahi: Contact No.: 91-8328919036
No ratings yet
Akash Humar Panigrahi: Contact No.: 91-8328919036
3 pages
FreeOCR Notes
No ratings yet
FreeOCR Notes
1 page
MV PCInfo
No ratings yet
MV PCInfo
3 pages
Mid 1/41 Mid 2/51 Project (10) Lab Total (10) L1 L2 L3 L4 L5 L6 L7 L8 L9 Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8 Ex9
No ratings yet
Mid 1/41 Mid 2/51 Project (10) Lab Total (10) L1 L2 L3 L4 L5 L6 L7 L8 L9 Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8 Ex9
3 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Lamport Mutual Solved
No ratings yet
Lamport Mutual Solved
3 pages
Big Data
No ratings yet
Big Data
5 pages
Is It A TV? or Something Better?: Features
No ratings yet
Is It A TV? or Something Better?: Features
2 pages
Lab 1 Sol
No ratings yet
Lab 1 Sol
2 pages
Human Computer Interaction
100% (1)
Human Computer Interaction
3 pages
SmartPTT Basic System Requirements
No ratings yet
SmartPTT Basic System Requirements
6 pages
Rti Data Science Predictive Analytics
No ratings yet
Rti Data Science Predictive Analytics
2 pages
Computational Thinking: Syllabus
No ratings yet
Computational Thinking: Syllabus
2 pages
Controlador KSU-II
No ratings yet
Controlador KSU-II
2 pages
Army Battle Casualties PDF
No ratings yet
Army Battle Casualties PDF
1 page
International Institute of Digital Technologies: Government of Andhra Pradesh
No ratings yet
International Institute of Digital Technologies: Government of Andhra Pradesh
2 pages
C FP Cloud Computing 16
No ratings yet
C FP Cloud Computing 16
1 page
Big Data
No ratings yet
Big Data
3 pages
CS 425 Software Engineering
No ratings yet
CS 425 Software Engineering
1 page
Parmanand Rastogi: B.Tech (ECE), M.Tech (Green Energy Technology) Personal Details
No ratings yet
Parmanand Rastogi: B.Tech (ECE), M.Tech (Green Energy Technology) Personal Details
3 pages
AP Electronics & Information Technology Agency: Government of Andhra Pradesh
No ratings yet
AP Electronics & Information Technology Agency: Government of Andhra Pradesh
2 pages
Sets6000 Brochure PDF
50% (4)
Sets6000 Brochure PDF
2 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

What Is Hadoop

Uploaded by

What Is Hadoop

Uploaded by

WhatIsHadoop?

ApacheHadoopisapowerfulopensourcesoftwareplatform thataddressesbothoftheseproblems.HadoopisanApache SoftwareFoundationproject.Clouderaofferscommercial supportandservicestoHadoopusers. 1AnUpdatedForecastofWorldwideInformationGrowth Through2011,IDC,March2008. WhatisHadoop?BigDataintheEnterprise 1

Figure1:HDFSdistributesfileblocksamongservers HDFShasseveralusefulfeatures.Intheverysimpleexample shown,anytwoserverscanfail,andtheentirefilewillstillbe available.HDFSnoticeswhenablockoranodeislost,and createsanewcopyofmissingdatafromthereplicasit

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.