0% found this document useful (0 votes)
26 views29 pages

Bda Ut-1 Qbank Ans by Rba

Uploaded by

BEA02SOUMAN BAG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
26 views29 pages

Bda Ut-1 Qbank Ans by Rba

Uploaded by

BEA02SOUMAN BAG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 29
. oye Modute —1 8.14 fxplain__the types of Big Dota >| Thre typer + 4h Unstructured 2. tructured. BE £m} = Structured) 1. # Usnsturcturea! data - i) Any date diith woknouwm dere or th St reta ic_clacsified ar _uncirwetured dara | i) Pea ple_= heserogenenure lata 1 aiding comb; snatio. P. J text By imager, wid like seaxch i: G@ooal, Engi: ig) Unsimretuved Cxarnple = Thy LEP yeturnecd by Geog Search Source £ aciructured data = wo b pages 2) Tage r yp videor, ‘Re pox br s) ig vey J GS) word “documentr £ power poins pues ntatinwn.- q Advonra ger — very calla ble 2.) Qatrq 4 peitabt Disadvantages = Wiotsgk of Securit 2) DiPPreait to Store € manaye daka_ : 3. Kemi -Stiuctured Data — | Se ae eee ee ee third type of ——_ big—dota Somi-structired data ao Fj sata both the arms of data, t i) Pata does not Pig po a olata wade) i lout har ome Structure. Data ean mot b Stoced in the fa of vows and Columrr ar}. Portabare ; t Ouse. of — Seni structure, Data’ Jn) att | ) xmyp £ pthey mmorkup /angusage: [| 4) Zipped Bites: Pa) eh pages. | s) Tep/ TP Packetr I Ad oninger — I Ly Data ic povtabl IL ay Clexiple he Sthemm can be eacily changed CG a Disadvortage c — 1) Queric ane _lers efficient ar Compared” do eractued data. 2) Lack of fixed rigid chema nate _# . APferernd ja Storage of the ate e2ie plain thowacterichier of Big data | DVolu = |_— Nolume tepresents +} ole | te Amount of lata that wong at a | a high rat ize. data Lum. inp ta byt fo) vate = | ‘alu of to tus: “3 data inte value j Is) we acity = | ve. arity efey to unce tainty Tok aveiinye dat Tt _asises due to” th | big volume af date thet hyinge Weomo letenee and inconcistenty € Wy Le 4) ifMali zation = x Visualization js the procesr of alicplowing data jy chaytr gvophs maps of other iyual ome S) Variety = aviery tefers’ to the different ata _tipes fe. varriou 1 Eat xt, audios vicleos ef 6) Velority = Mel ecitey ics the vate Ot which data oro cial “wnedia Contribute ap: mo |. velocibs £__g-cp ag dato p) Viva\ity = rw Vixality cleseriher d aetr Spread yoxDy people to pene l 0 tu reve [es yw 1.9.2 Hadoop Ecosystem Overview Hadoop ecosystem is a platform or framework which assistances in solving the big data problems. It includes different components and services (ingesting, storing, analysing, and maintaining) inside of it. Most of the services available in the Hadoop eoystem which contains the main four core components of Hadoop which include HDFS, YARN, MapReduce and Common. Hadoop ecosystem contains both Apache Open Source pijets and other wide variation of marketable tools and solutions. Zookeeper Data oazie) Chukwa Flume Management || Management workflow 1 onitoring | Monitoring monitoring shout Sqoop Pig leans (RDBMS | | Data Access (Dataflow) | (ening) Connector) YARN Map Reduce Data (luster and Resource a (cluster Management) Management) Processing Data Storage HDFS HBASE (File System) (Column DB storage) J (a9Fig. 1.9.1: Hadoop Ecosystem Some of the open source examples are Spark, Hive, got some idea about what Hadoop ecosystem is, components, let's discuss each concept in detail, Pig, Sqoop and Oozie. As wi e have what it does, and what are its Wohat i Big Date) Gi $r__Spphication = WW Big data ic a marsive collection of daa that Continuotis fo grow —daamabically Overtime i) iy 2g clotasetr that ip so hu tupica) alata and complicated +that no snanazgement technologies can etlectively | tore 0 Process sb ti) The data to babys fe inis byte sing | a Big date, Ties Ts. is baked that almost S04 £ today's dots thot har generated | t Ae past eave . o | Bis alata _apphicatio ) Proud cdetection Q TT log anoludty 3) Cau conte Qnelytrer 4) Seetal Media anelyris ot ya 1.8.1 What is Hadoop ? Hadoop is an open-source software Platform for storing massive volumes of data and running applications on clusters (groups) of commodity software. It gives us the massive data storage capability, massive computational power and the ability to handle different virtually limitless jobs that can be a running job, waiting jobs or tasks. Its main essential component is to support growing big data technologies, thereby support forward-thinking analytics like Predictive analytics, Machine learning and data mining. Hadoop has the capability to handle different modes of data such as structured, unstructured and semi- structured data. It gives us the elasticity to collect, process, and investigate data that the old data warehouses concept failed to do. 1 s to be distributeg | 2 1.8.3 Features of HadooP pata manage _. gs Big Data i 1. Suitable for Big Data Analysis 1 wolenathed for analysis of Big ys unstructured in nature Hadoor ST yl data) that ws Wo the compu, joi nok the this concept is called as data log plications. Y Meanwhile it is processing | nodes and less network bandwidth is 5 which helps to increase the productivity of 2. Scalability : Hadoop clusters can easily be sealed cluster nodes and thus allows for the growth of Bi require adjustments to application logic. 3. Fault Tolerance : Hadoop network has a facili vent of a cluster no Juster node. -Hadoop based @PI to any amount by adding ex ig Data. Also, scaling does tal ty to duplicate the input data on ty de failure the data processing ean other cluster nodes. So, in the e still process data by using data stored on another cl a 1.8.4 Advantages of Hadoop 1. Fast : In HDFS the data distributed over the cluster and mapped such a way which e tools to process the data are often on the same helps in faster recovery. Even th servers, thus reducing the processing time can be efficient way to manage the data. It also processes terabytes of data in minutes and Peta bytes in hours. 2. Scalable : Hadoop cluster can be extended by just adding nodes in the cluster s0 failure chance can be less. 3. Cost Effective : Hadoop is open source and uses commodity hardware to store data, it is cheaper as compared to traditional RDMS. 4, Tough to failure : HDFS has the property with which it can duplicate data over the network, so if one node is down or some other network failure happens, then Hadoop takes the backup data and use it. Normally, data are replicated thrice but the __ replication factor is configurable. I = 1 k Y, I [os nade | Backup Wade | I J | \ 2. cle [onta nod Data nod: L Data - 1 ae a boo [ [eo oo 0 : q FP Tp. I l H I = — | =F Ne | I Fig. Hoes amchitectu Ly sth roe i prumasy stovage Syrt ! used bby Hadeop applications | i) aoe i a—clittributed Pile system and | 5 l | | 4 drone Per yratinn & hugi dataret cuhich urer pasiadigsen map edu aced ee stem ( EFS) His) Hoe ic 2 google | data _@evorr Imadenp Clhurpeny Ivy Woer i usually deployed pg Commoel. hacdinare of’ of ow Cnry 1 Wher Ha | possibiity of Server “faihiae ip Comma |i) “The Eiper faitites the api _tremefs I of data bho lif fer end Somputer Moder £ | emabte hadeop System 4 preced jor execuy! even if one py mex neder gets failed =| HOFL ic knows wii) The axchitecture seal by arch jiec Atere a asker love - vit) Na sacde = tw hieh vanager the metadata of Piles C infor mati xega dig fi dlivectnsjes) £. data ode. which steer th actual date eaten ff 1144, 4 Advantages 3 ) Ria) Scalabi}ity 2) low lienitatina ) Oo Sb e ww) leur cosh Disa dvantager &= 2 Parogaca maming ynodel fr Veey epestricti a) Clupter _*asnagement fr hi ) Key rough = meaner Pt wa under active develop 1 @. 8.1 whos i ' a . i hich i)_map Reduce ir tech ipue as iu huge aa jo arbdiwiole nto aa |) pany | omputatinn faster _, 9axe i maortly | weed ig dictated 23s0 t sy in ey = ~ tt taker aw cata i A £ oxganizes rote key apaly Pac! ta) Qedusey = Tey sponcible ececsing | a data ip Paralle/ ff Prods Pe oud Pdamictay | ) alabivty = prapReduse ic desion to cal Nese} cata lly allewiina py dp oe. yaas ci data | A) Pleni bi} sty PAng pects aa pacers Shuahiced| &_unsretuved dara snakes | it reat le Jo}? ) Data __lpcalj ey Ly Shoap lial: Y J Cost OP Le of ve gern, a) oft limitations Hadonp 2 Le doit Small Peleg. i) et Paile whe Fs eed tn accen. th adece f Si ~Oe fil to. oad 1) there Qmound fil 420. a \arge ama. mall Ie: then Lae lawn e Node wor a © rloade 2) jue Hadoop abides — ic ork that ir written f the mors, com: tohich *: solay J ona) O ja gua g, Pe an he easily cr iLminals 2 Lack Data iw Securigdy re £ ith ya. feu pxganizat) “eg QFonw hy defoutt smnade Un avaiable fea fuse 1 Hadoap. onty Rareh cepDy~g = wn) ruppott Ratch p. T error Other ie Qa ce so rack g round av P: deec pm kina of {pteractio with ee —— ——— aH -sTme (105) t #1 mata ectoy. Multiplica pio by Mr apRedu ce. vite MapReduce eudocode ra enutHply twa agarmices _éustyat the _proceduwe Soin the Fo \ourn: roa rr ces aleaty how all the Step 1 5 : ; Map Reduce ir a fechnt Ww a Geeogram ie ubdivided Into Small tark anol CAIN Passallely 4m make computation fone ‘sce din: and. procs tly Daisedi in distr Susie Tr has tan Fin Porrant cbr LD Mappet. = it takes raw data input and Os ganises into key val ve pain 4 2) Reduces = jt ir erponcible pes ésring data. in paca vel a d ke BUA es Ortoup For €-g t— comsider the following materi a 7. k ; efi aT wis ¢ [2 a4] Viz e@ ° Tay) Cr KD CRI (By ip Pile \ Wasa de i aud. > pow of A fi © oO ! jecolrofA 4 © ! 2. j > Row pf R é ! So 3 E> rel of 8 6 lI ! 4 ] 2 ® 6 5 p © f vif es | mz [| © J #6 l 8 | 1 | ) / nail Map Fin tion Crratei 4 alue ) iZk alue i Oo (A,2.\) 9 l CA, 2) v | ¢A.o,3) | wef Cois,4) kp © (g 0.5) oO (Bp, ! t€) t (B28 By l (8.1.8) Reduce —Finctioo > hu Pile gascup]. Co ££ kd 0 Co,0) 2A.o1) (aia) (8,05) (Boia) 91). -(6,0,1) (p+ 2) Cg 04) (B, 1,8) (4,0) fe Abs) Cara), (p, 05) Cas) Ci) {ca.o,9) A,t4) CB o é), (8,13) =| Reditoe (0,0) (ixs)+(2%4) = 19 fo) Cina )y (axe = 22 C1o) Caxs)4 (4x9) = 43 Ci C3xée)iians) = $0 j | Venee Resultant Matrix ix 19 22 i se] /2 5 — Stare —_-Leodure of Hor noe itecdua e uabich Stove ¢ _the dirty laches J actual ths oxy xe dat the Qata ane Dera ey de in @xpenci ete ey & lata aby per Gans a. tex of é softlre, Breda £ dats olu an their cata £ date — a dabyter ) 1 Strtetu ved, et getc_crachec! thea Pre poccibiniy of ‘ a itr Hey) s) Cuma No 1 ane £ xe bal lies of dasa . ££ osmate Higis e atlabl lity — ores a oS 2g ofl oo sce phica goes cto £ Data Nod <5 rf yrtern rela hi £ dais block» Labia the data £ 1D aniv: ¢ d fh ieee Hae ame J data s) tality Dara Nodes look Te s: mobi Lie pana eestor Ho satel Scalability = dal Nevtica tl -Scalobjiity adld_soapxe mere nach? . redad Cceu » Memory Disk) sarc er tM (f 11 ret bine x | Qi. ; list advontager of —« > | i) Rechicer the time take. fee qronSey ing [ane dota fee, Mappex to Ped ucer Fa ee 2 the ntewmed int, | Dutput exated be (oapper. vi Tn prove pet, me by Animi2 ing Netioory & congestion Scalatoj iby the ror ke} Bx Redune Optivn, e MapRedure jobs ft, a is §. Ceplaiss nin sa Buciness Vowime we rn P frojility ctloeiby le ode} gility RORRY - Ware ali tity. LV olume ~ 2)_velocita . 3) Navini ies A) As ity 3). vaxiah: Yb Th b the dat od 4. sonpirdenciers tn eae ing £. CE poatiry 2 xceptiny lato — Shon gp eit atkermphing to ure ais) a/ henna Aecretu imposed by Roe me aya: ni) Adding eus columns «to “ROR Ny veguive. huttiva dou th y cde £ wari py ciem the Bkew “Taw a ot bine dlataces ir lar 4 of Lect the availabiity of hic pancert ca YE enn ‘ i £ oa ia Cochin, Pier £ ened 4) Aaitzeay — ‘y Aigilits ec abiliky to accept chara 4 acilss f—banielly, i) purting dart © into Saas gettin lata tt ns tthe data bares. 2X 3.1.4 CAP Theorem (4 Marks)! Tt plays important role in NoSQL databases. CAP theorem is also called brewer's theorem which states that it is impossible for a distributed data store to offer more than two out of three guarantees QL databases offer consistency So basically, some NoS ailability and and partition tolerance. While some offer av: partition tolerance. But partition tolerance is common NoSQL databases are distributed in nature so based on requirement, we can choose NoSQL database has to be sag used. Different types of NoSQL databases are available as based on data models. Fig 3.1.1 :CAP Property = consistency his means that the data in the database remains consistent after the execution of an operation. For example, after an update operation all clients see the same data. availability + This means that the system is always on (service guarantee availability), no downtime. partition Tolerance + This means that the system continues to function even the communication among the servers is unreliable, i.e, the servers may be partitioned into multiple groups that cannot communicate with one another. In theoretically it is impossible to fulfil all 3 requirements. CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore, all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem, Here is the brief description of three combinations CA, CP, : CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks. CP -Some data may not be accessible, but the rest is still consistent/accurate. AP - System is still available under partitioning, but some of the data returned may be inaccurate. The use of the word consistency in CAP and its use in ACID do not refer to the same identical concept. In CAP, the term consistency refers to the consistency of the values in different copies of the same data item in a replicated distributed system. In ACID, it refers to the fact that a transaction will not violate the integrity constraints specified on the database schema dotabas ave a da shavce ta ead nite calali\i. No limit on sealing. | Scatateiii ty 9 | Limipedl Scalobj ir Ls wi) Transaction wovidte = — . one | ter —F “Eransaction written in oma: locatio: g yi ty psunpd Thr _covn pl ene || axonsestiosac | Ty supports imp! dronsa erin rr Pnsure #ra0saction vin) Bi ecudity pruee: J || Secu by To Tt ir ured to handle iy) Dr iv used +0 hanel | Olauta eo ing tn _hye L, Oracle glite L data orming i low ] {| eboerey lois i a Tl wv) Example 2= Oy @OL Example i= Big Table, Redic Ms-GaQr, et Raven DR CouchDR et >. a Qlg-ll Chonactenirtier of —fto— Lr Hy pte elational Lote i) Nos@ database neue e ~the elation al model i) Nevew provide ables —tnith flap Fixey col second ) Open - Sounce pos @1 __alatabases olen’) cequive epemey li enging Cex .£ Can yuo on iexpemsin hawdiare 3) henna = Par Nove, darabors are Cithe hema fre or have relaxed schemas 4) Simapie » Ppt Mostly used trendard based euery Jam quiag SZ 3 | S) Dir tripured Nera, oclatabacer cam be exesuped 4 dictributed — fachy, ud _offe auto Seating & Lal ove aPab: Ipbier State eo xamelte database | WO manna OB = Jb (It ir develope Patan) 2) Rexketey Dp — (Tr ic developed Pij—Dracle) 2) Reor - : i An advanced Ope source key uate : yore Cc, crry Ip Rink I Ain open courre phat ic powerful | clistviviped’ databace th ag pred tapel. t ecalescagahiliny @ Ce) le) Volt pp I Scalable clarahare > = [that a Pfece compete tran sane net | indemey Ultra ehigh dh wag h pug 7 I Pit 0 Bewred “ty ae athe Nene Soy I | | I (CLL) Nose) Sara Architect xe Pattern, S20 Explain im detail key-value store Nee | cchitectinal patie cn SD key vate Store. a Tee] =) Dao al Tees als mr - | Il Tees rata) User = (23 “Jehn | t imaye -123 gbinasy imp I Tice x | Wowey mie Tee bens [

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy