Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
26 views
29 pages
Bda Ut-1 Qbank Ans by Rba
Uploaded by
BEA02SOUMAN BAG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download
Save
Save Bda Ut-1 Qbank Ans by Rba For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
0 ratings
0% found this document useful (0 votes)
26 views
29 pages
Bda Ut-1 Qbank Ans by Rba
Uploaded by
BEA02SOUMAN BAG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save Bda Ut-1 Qbank Ans by Rba For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
Download
Save Bda Ut-1 Qbank Ans by Rba For Later
You are on page 1
/ 29
Search
Fullscreen
. oye Modute —1 8.14 fxplain__the types of Big Dota >| Thre typer + 4h Unstructured 2. tructured. BE £m} = Structured) 1. # Usnsturcturea! data - i) Any date diith woknouwm dere or th St reta ic_clacsified ar _uncirwetured dara | i) Pea ple_= heserogenenure lata 1 aiding comb; snatio. P. J text By imager, wid like seaxch i: G@ooal, Engi: ig) Unsimretuved Cxarnple = Thy LEP yeturnecd by Geog Search Source £ aciructured data = wo b pages 2) Tage r yp videor, ‘Re pox br s) ig vey J GS) word “documentr £ power poins pues ntatinwn.- q Advonra ger — very calla ble 2.) Qatrq 4 peitabt Disadvantages = Wiotsgk of Securit 2) DiPPreait to Store € manaye daka_: 3. Kemi -Stiuctured Data — | Se ae eee ee ee third type of ——_ big—dota Somi-structired data ao Fj sata both the arms of data, t i) Pata does not Pig po a olata wade) i lout har ome Structure. Data ean mot b Stoced in the fa of vows and Columrr ar}. Portabare ; t Ouse. of — Seni structure, Data’ Jn) att | ) xmyp £ pthey mmorkup /angusage: [| 4) Zipped Bites: Pa) eh pages. | s) Tep/ TP Packetr I Ad oninger — I Ly Data ic povtabl IL ay Clexiple he Sthemm can be eacily changed CG a Disadvortage c — 1) Queric ane _lers efficient ar Compared” do eractued data. 2) Lack of fixed rigid chema nate _# . APferernd ja Storage of the atee2ie plain thowacterichier of Big data | DVolu = |_— Nolume tepresents +} ole | te Amount of lata that wong at a | a high rat ize. data Lum. inp ta byt fo) vate = | ‘alu of to tus: “3 data inte value j Is) we acity = | ve. arity efey to unce tainty Tok aveiinye dat Tt _asises due to” th | big volume af date thet hyinge Weomo letenee and inconcistenty € WyLe 4) ifMali zation = x Visualization js the procesr of alicplowing data jy chaytr gvophs maps of other iyual ome S) Variety = aviery tefers’ to the different ata _tipes fe. varriou 1 Eat xt, audios vicleos ef 6) Velority = Mel ecitey ics the vate Ot which data oro cial “wnedia Contribute ap: mo |. velocibs £__g-cp ag dato p) Viva\ity = rw Vixality cleseriher d aetr Spread yoxDy people to pene l 0 tu reve [esyw 1.9.2 Hadoop Ecosystem Overview Hadoop ecosystem is a platform or framework which assistances in solving the big data problems. It includes different components and services (ingesting, storing, analysing, and maintaining) inside of it. Most of the services available in the Hadoop eoystem which contains the main four core components of Hadoop which include HDFS, YARN, MapReduce and Common. Hadoop ecosystem contains both Apache Open Source pijets and other wide variation of marketable tools and solutions.Zookeeper Data oazie) Chukwa Flume Management || Management workflow 1 onitoring | Monitoring monitoring shout Sqoop Pig leans (RDBMS | | Data Access (Dataflow) | (ening) Connector) YARN Map Reduce Data (luster and Resource a (cluster Management) Management) Processing Data Storage HDFS HBASE (File System) (Column DB storage) J (a9Fig. 1.9.1: Hadoop Ecosystem Some of the open source examples are Spark, Hive, got some idea about what Hadoop ecosystem is, components, let's discuss each concept in detail, Pig, Sqoop and Oozie. As wi e have what it does, and what are itsWohat i Big Date) Gi $r__Spphication = WW Big data ic a marsive collection of daa that Continuotis fo grow —daamabically Overtime i) iy 2g clotasetr that ip so hu tupica) alata and complicated +that no snanazgement technologies can etlectively | tore 0 Process sb ti) The data to babys fe inis byte sing | a Big date, Ties Ts. is baked that almost S04 £ today's dots thot har generated | t Ae past eave . o | Bis alata _apphicatio ) Proud cdetection Q TT log anoludty 3) Cau conte Qnelytrer 4) Seetal Media anelyrisot ya 1.8.1 What is Hadoop ? Hadoop is an open-source software Platform for storing massive volumes of data and running applications on clusters (groups) of commodity software. It gives us the massive data storage capability, massive computational power and the ability to handle different virtually limitless jobs that can be a running job, waiting jobs or tasks. Its main essential component is to support growing big data technologies, thereby support forward-thinking analytics like Predictive analytics, Machine learning and data mining. Hadoop has the capability to handle different modes of data such as structured, unstructured and semi- structured data. It gives us the elasticity to collect, process, and investigate data that the old data warehouses concept failed to do.1 s to be distributeg | 2 1.8.3 Features of HadooP pata manage _. gs Big Data i 1. Suitable for Big Data Analysis 1 wolenathed for analysis of Big ys unstructured in nature Hadoor ST yl data) that ws Wo the compu, joi nok the this concept is called as data log plications. Y Meanwhile it is processing | nodes and less network bandwidth is 5 which helps to increase the productivity of 2. Scalability : Hadoop clusters can easily be sealed cluster nodes and thus allows for the growth of Bi require adjustments to application logic. 3. Fault Tolerance : Hadoop network has a facili vent of a cluster no Juster node. -Hadoop based @PI to any amount by adding ex ig Data. Also, scaling does tal ty to duplicate the input data on ty de failure the data processing ean other cluster nodes. So, in the e still process data by using data stored on another cl a 1.8.4 Advantages of Hadoop 1. Fast : In HDFS the data distributed over the cluster and mapped such a way which e tools to process the data are often on the same helps in faster recovery. Even th servers, thus reducing the processing time can be efficient way to manage the data. It also processes terabytes of data in minutes and Peta bytes in hours. 2. Scalable : Hadoop cluster can be extended by just adding nodes in the cluster s0 failure chance can be less. 3. Cost Effective : Hadoop is open source and uses commodity hardware to store data, it is cheaper as compared to traditional RDMS. 4, Tough to failure : HDFS has the property with which it can duplicate data over the network, so if one node is down or some other network failure happens, then Hadoop takes the backup data and use it. Normally, data are replicated thrice but the __ replication factor is configurable.I = 1 k Y, I [os nade | Backup Wade | I J | \ 2. cle [onta nod Data nod: L Data - 1 ae a boo [ [eo oo 0 : q FP Tp. I l H I = — | =F Ne | I Fig. Hoes amchitectu Ly sth roe i prumasy stovage Syrt ! used bby Hadeop applications | i) aoe i a—clittributed Pile system and | 5 l | | 4 drone Per yratinn & hugi dataret cuhich urer pasiadigsen map edu aced ee stem ( EFS) His) Hoe ic 2 google | data _@evorr Imadenp Clhurpeny Ivy Woer i usually deployed pg Commoel. hacdinare of’ of ow Cnry 1 Wher Ha | possibiity of Server “faihiae ip Comma |i) “The Eiper faitites the api _tremefs I of data bho lif fer end Somputer Moder £ | emabte hadeop System 4 preced jor execuy! even if one py mex neder gets failed =|HOFL ic knows wii) The axchitecture seal by arch jiec Atere a asker love - vit) Na sacde = tw hieh vanager the metadata of Piles C infor mati xega dig fi dlivectnsjes) £. data ode. which steer th actual date eaten ff 1144, 4 Advantages 3 ) Ria) Scalabi}ity 2) low lienitatina ) Oo Sb e ww) leur cosh Disa dvantager &= 2 Parogaca maming ynodel fr Veey epestricti a) Clupter _*asnagement fr hi ) Key rough = meaner Pt wa under active develop 1@. 8.1 whos i ' a . i hich i)_map Reduce ir tech ipue as iu huge aa jo arbdiwiole nto aa |) pany | omputatinn faster _, 9axe i maortly | weed ig dictated 23s0 t sy in ey = ~ tt taker aw cata i A £ oxganizes rote key apaly Pac! ta) Qedusey = Tey sponcible ececsing | a data ip Paralle/ ff Prods Pe oud Pdamictay | ) alabivty = prapReduse ic desion to cal Nese} cata lly allewiina py dp oe. yaas ci data | A) Pleni bi} sty PAng pects aa pacers Shuahiced| &_unsretuved dara snakes | it reat le Jo}? ) Data __lpcalj ey Ly Shoap lial: Y J Cost OP Le of ve gern,a) oft limitations Hadonp 2 Le doit Small Peleg. i) et Paile whe Fs eed tn accen. th adece f Si ~Oe fil to. oad 1) there Qmound fil 420. a \arge ama. mall Ie: then Lae lawn e Node wor a © rloade 2) jue Hadoop abides — ic ork that ir written f the mors, com: tohich *: solay J ona) O ja gua g, Pe an he easily cr iLminals 2 Lack Data iw Securigdy re £ ith ya. feu pxganizat) “eg QFonw hy defoutt smnade Un avaiable fea fuse 1 Hadoap. onty Rareh cepDy~g = wn) ruppott Ratch p. T error Other ie Qa ce so rack g round av P: deec pm kina of {pteractio with ee—— ——— aH -sTme (105) t #1 mata ectoy. Multiplica pio by Mr apRedu ce. vite MapReduce eudocode ra enutHply twa agarmices _éustyat the _proceduwe Soin the Fo \ourn: roa rr ces aleaty how all the Step 1 5 : ; Map Reduce ir a fechnt Ww a Geeogram ie ubdivided Into Small tark anol CAIN Passallely 4m make computation fone ‘sce din: and. procs tly Daisedi in distr Susie Tr has tan Fin Porrant cbr LD Mappet. = it takes raw data input and Os ganises into key val ve pain 4 2) Reduces = jt ir erponcible pes ésring data. in paca vel a d ke BUA es Ortoup For €-g t— comsider the following materi a 7. k ; efi aT wis ¢ [2 a4] Viz e@ ° Tay) Cr KD CRI (By ip Pile \ Wasa de i aud. > pow of A fi © oO ! jecolrofA 4 © ! 2. j > Row pf R é ! So 3 E> rel of 8 6 lI ! 4 ] 2 ® 6 5 p © f vif es | mz [| © J #6 l 8 | 1 | ) / nailMap Fin tion Crratei 4 alue ) iZk alue i Oo (A,2.\) 9 l CA, 2) v | ¢A.o,3) | wef Cois,4) kp © (g 0.5) oO (Bp, ! t€) t (B28 By l (8.1.8) Reduce —Finctioo > hu Pile gascup]. Co ££ kd 0 Co,0) 2A.o1) (aia) (8,05) (Boia) 91). -(6,0,1) (p+ 2) Cg 04) (B, 1,8) (4,0) fe Abs) Cara), (p, 05) Cas) Ci) {ca.o,9) A,t4) CB o é), (8,13) =| Reditoe (0,0) (ixs)+(2%4) = 19 fo) Cina )y (axe = 22 C1o) Caxs)4 (4x9) = 43 Ci C3xée)iians) = $0 j | Venee Resultant Matrix ix 19 22 i se] /25 — Stare —_-Leodure of Hor noe itecdua e uabich Stove ¢ _the dirty laches J actual ths oxy xe dat the Qata ane Dera ey de in @xpenci ete ey & lata aby per Gans a. tex of é softlre, Breda £ dats olu an their cata £ date — a dabyter ) 1 Strtetu ved, et getc_crachec! thea Pre poccibiniy of ‘ a itr Hey) s) Cuma No 1 ane £ xe bal lies of dasa . ££ osmate Higis e atlabl lity — ores a oS 2g ofl oo sce phica goes cto £ Data Nod <5 rf yrtern rela hi £ dais block» Labia the data £ 1D aniv: ¢ d fh ieee Hae ame J data s) tality Dara Nodes look Te s: mobi Lie pana eestor Ho satel Scalability = dal Nevtica tl -Scalobjiity adld_soapxe mere nach? . redad Cceu » Memory Disk) sarc ertM (f 11 retbine x | Qi. ; list advontager of —« > | i) Rechicer the time take. fee qronSey ing [ane dota fee, Mappex to Ped ucer Fa ee 2 the ntewmed int, | Dutput exated be (oapper. vi Tn prove pet, me by Animi2 ing Netioory & congestion Scalatoj iby the ror ke} Bx Redune Optivn, e MapRedure jobsft, a is §. Ceplaiss nin sa Buciness Vowime we rn P frojility ctloeiby le ode} gility RORRY - Ware ali tity. LV olume ~ 2)_velocita . 3) Navini ies A) As ity3). vaxiah: Yb Th b the dat od 4. sonpirdenciers tn eae ing £. CE poatiry 2 xceptiny lato — Shon gp eit atkermphing to ure ais) a/ henna Aecretu imposed by Roe me aya: ni) Adding eus columns «to “ROR Ny veguive. huttiva dou th y cde £ wari py ciem the Bkew “Taw a ot bine dlataces ir lar 4 of Lect the availabiity of hic pancert ca YE enn ‘ i £ oa ia Cochin, Pier £ ened 4) Aaitzeay — ‘y Aigilits ec abiliky to accept chara 4 acilss f—banielly, i) purting dart © into Saas gettin lata tt ns tthe data bares.2X 3.1.4 CAP Theorem (4 Marks)! Tt plays important role in NoSQL databases. CAP theorem is also called brewer's theorem which states that it is impossible for a distributed data store to offer more than two out of three guarantees QL databases offer consistency So basically, some NoS ailability and and partition tolerance. While some offer av: partition tolerance. But partition tolerance is common NoSQL databases are distributed in nature so based on requirement, we can choose NoSQL database has to be sag used. Different types of NoSQL databases are available as based on data models. Fig 3.1.1 :CAP Property= consistency his means that the data in the database remains consistent after the execution of an operation. For example, after an update operation all clients see the same data. availability + This means that the system is always on (service guarantee availability), no downtime. partition Tolerance + This means that the system continues to function even the communication among the servers is unreliable, i.e, the servers may be partitioned into multiple groups that cannot communicate with one another. In theoretically it is impossible to fulfil all 3 requirements. CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore, all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem, Here is the brief description of three combinations CA, CP, : CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks. CP -Some data may not be accessible, but the rest is still consistent/accurate. AP - System is still available under partitioning, but some of the data returned may be inaccurate. The use of the word consistency in CAP and its use in ACID do not refer to the same identical concept. In CAP, the term consistency refers to the consistency of the values in different copies of the same data item in a replicated distributed system. In ACID, it refers to the fact that a transaction will not violate the integrity constraints specified on the database schemadotabas ave a da shavce ta ead nite calali\i. No limit on sealing. | Scatateiii ty 9 | Limipedl Scalobj irLs wi) Transaction wovidte = — . one | ter —F “Eransaction written in oma: locatio: g yi ty psunpd Thr _covn pl ene || axonsestiosac | Ty supports imp! dronsa erin rr Pnsure #ra0saction vin) Bi ecudity pruee: J || Secu by To Tt ir ured to handle iy) Dr iv used +0 hanel | Olauta eo ing tn _hye L, Oracle glite L data orming i low ] {| eboerey lois i a Tl wv) Example 2= Oy @OL Example i= Big Table, Redic Ms-GaQr, et Raven DR CouchDR et>. a Qlg-ll Chonactenirtier of —fto— Lr Hy pte elational Lote i) Nos@ database neue e ~the elation al model i) Nevew provide ables —tnith flap Fixey col second ) Open - Sounce pos @1 __alatabases olen’) cequive epemey li enging Cex .£ Can yuo on iexpemsin hawdiare 3) henna = Par Nove, darabors are Cithe hema fre or have relaxed schemas 4) Simapie » Ppt Mostly used trendard based euery Jam quiag SZ 3 | S) Dir tripured Nera, oclatabacer cam be exesuped 4 dictributed — fachy, ud _offe auto Seating & Lal ove aPab: IpbierState eo xamelte database | WO manna OB = Jb (It ir develope Patan) 2) Rexketey Dp — (Tr ic developed Pij—Dracle) 2) Reor - : i An advanced Ope source key uate : yore Cc, crry Ip Rink I Ain open courre phat ic powerful | clistviviped’ databace th ag pred tapel. t ecalescagahiliny @ Ce) le) Volt pp I Scalable clarahare > = [that a Pfece compete tran sane net | indemey Ultra ehigh dh wag h pug 7 I Pit 0 Bewred “ty ae athe Nene Soy I | | I(CLL) Nose) Sara Architect xe Pattern, S20 Explain im detail key-value store Nee | cchitectinal patie cn SD key vate Store. a Tee] =) Dao al Tees als mr - | Il Tees rata) User = (23 “Jehn | t imaye -123 gbinasy imp I Tice x | Wowey mie Tee bens [
You might also like
Big Data & Hadoop Training Material 0 1 PDF
PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
BAD601 Module 2 PDF
PDF
No ratings yet
BAD601 Module 2 PDF
61 pages
Cloud Computing Unit-5
PDF
No ratings yet
Cloud Computing Unit-5
22 pages
Cloud - UNIT V
PDF
No ratings yet
Cloud - UNIT V
18 pages
Unit-5 CC
PDF
No ratings yet
Unit-5 CC
21 pages
Hadoop Notes
PDF
No ratings yet
Hadoop Notes
21 pages
Cloud Chapter 4SWE
PDF
No ratings yet
Cloud Chapter 4SWE
40 pages
Bda Ese
PDF
No ratings yet
Bda Ese
21 pages
2nd Unit Bda
PDF
No ratings yet
2nd Unit Bda
30 pages
DocScanner Jan 12, 2023 2-29 PM
PDF
No ratings yet
DocScanner Jan 12, 2023 2-29 PM
32 pages
Seminar Report PDF
PDF
100% (2)
Seminar Report PDF
35 pages
Seminar Report On Bigdata and Hadoop
PDF
No ratings yet
Seminar Report On Bigdata and Hadoop
4 pages
Unit IV Hadoop
PDF
No ratings yet
Unit IV Hadoop
90 pages
Module 2
PDF
No ratings yet
Module 2
34 pages
Popegm
PDF
No ratings yet
Popegm
246 pages
Bda U2
PDF
No ratings yet
Bda U2
68 pages
Unit-5 - Hadoop
PDF
No ratings yet
Unit-5 - Hadoop
29 pages
Unit # 2
PDF
No ratings yet
Unit # 2
23 pages
Bda QP-1
PDF
No ratings yet
Bda QP-1
23 pages
Big Data and Mapreduce Challenges, Opportunities and Trends
PDF
No ratings yet
Big Data and Mapreduce Challenges, Opportunities and Trends
9 pages
Analyzing Big Data in Hadoop Spark
PDF
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Experiment No. 11 Part A A.1 Aim: 2 Prerequisite: A.3 Outcome: After Successful Completion of This Experiment, Students Will Be Able To
PDF
No ratings yet
Experiment No. 11 Part A A.1 Aim: 2 Prerequisite: A.3 Outcome: After Successful Completion of This Experiment, Students Will Be Able To
21 pages
Unit-III (Big Data) Final
PDF
No ratings yet
Unit-III (Big Data) Final
34 pages
Hadoop: Er. Gursewak Singh Dsce
PDF
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
15 pages
BDA Unit2 Notes
PDF
No ratings yet
BDA Unit2 Notes
23 pages
Lec 3
PDF
No ratings yet
Lec 3
28 pages
Unit 5
PDF
No ratings yet
Unit 5
32 pages
Lec 3
PDF
No ratings yet
Lec 3
25 pages
DBMS Unit-5
PDF
No ratings yet
DBMS Unit-5
92 pages
Attachment
PDF
No ratings yet
Attachment
11 pages
Hadoop Presentation
PDF
No ratings yet
Hadoop Presentation
19 pages
Big Data
PDF
No ratings yet
Big Data
3 pages
Day 2 S1 Intro - To - Hadoop - Ashok
PDF
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
BDA Unit-3
PDF
No ratings yet
BDA Unit-3
47 pages
Big Data Analysis PDF 2
PDF
No ratings yet
Big Data Analysis PDF 2
18 pages
Big Data - Introduction To Hadoop
PDF
No ratings yet
Big Data - Introduction To Hadoop
61 pages
BDAunit II
PDF
No ratings yet
BDAunit II
4 pages
Bda Unit 1
PDF
No ratings yet
Bda Unit 1
32 pages
Cloud Comp Techno
PDF
No ratings yet
Cloud Comp Techno
5 pages
IOT and Comp - Architecture
PDF
No ratings yet
IOT and Comp - Architecture
17 pages
Report On An Exploratory Analysis of The
PDF
No ratings yet
Report On An Exploratory Analysis of The
19 pages
11 Lecture
PDF
No ratings yet
11 Lecture
22 pages
Fillatre Big Data
PDF
No ratings yet
Fillatre Big Data
98 pages
9 Hadoop PDF
PDF
No ratings yet
9 Hadoop PDF
59 pages
The Age OF: Every Minute
PDF
No ratings yet
The Age OF: Every Minute
47 pages
HADOOP
PDF
No ratings yet
HADOOP
10 pages
A Review Paper On Big Data
PDF
No ratings yet
A Review Paper On Big Data
5 pages
Chapter 2 Hadoop Eco System
PDF
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Kelly Hadoop NEW
PDF
100% (1)
Kelly Hadoop NEW
252 pages
Unit 3 Introduction To Hadoop Syllabus
PDF
No ratings yet
Unit 3 Introduction To Hadoop Syllabus
22 pages
Updated Unit-2
PDF
0% (1)
Updated Unit-2
55 pages
Hadoop and Big Data
PDF
No ratings yet
Hadoop and Big Data
41 pages
Hadoop V.01
PDF
No ratings yet
Hadoop V.01
24 pages
Big Data?: Hadoop?
PDF
No ratings yet
Big Data?: Hadoop?
2 pages