0% found this document useful (0 votes)

20 views42 pages

Unit 3 - BD - Hadoop Ecosystem

Uploaded by

2028110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views42 pages

Unit 3 - BD - Hadoop Ecosystem

Uploaded by

2028110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Big Data (CS-3032)

Kalinga Institute of Industrial Technology

Deemed to be University
Bhubaneswar-751024

School of Computer Engineering

Strictly for internal circulation (within KIIT) and reference only. Not for outside circulation without permission

3 Credit Lecture Note

Course Contents
2

Sr # Major and Detailed Coverage Area Hrs

3 Hadoop Ecosystem 8
Introduction to Hadoop, Hadoop Ecosystem, Hadoop Distributed File System,
MapReduce, YARN, Hive, Pig and PigLatin, Jaql - Zookeeper - HBase, Cassandra-
Oozie, Lucene- Avro, Mahout.

School of Computer Engineering

Introduction to Hadoop
3

Hadoop is an open-source project of the Apache Foundation. Apache Hadoop is

written in Java and a collection of open-source software utilities that facilitate
using a network of many computers to solve problems involving massive
amounts of data and computation. It provides a software framework for
distributed storage and processing of big data and uses Google’s MapReduce
and Google File System as its foundation.

Hadoop
Apache open-source software framework
Inspired by:
- Google MapReduce
- Google File System

Hadoop provides various tools and technologies, collectively termed as Hadoop

ecosystem, to enable development and deployment of Big Data solutions. It
accomplishes two tasks namely i) Massive data storage, and ii) Faster data
processing.
School of Computer Engineering
Flood of data
4

Let’s look at few stastics to get an idea of data gets generated every day, every
minute, and every second.
 Every day
 NYSE generates 1.5 billion shares and trade data
 Facebook stores 2.7 billion comments and likes
 Google processes about 24 petabytes of data
 Every minutes
 Facebook users share nearly 2.5 million pieces of content.
 Amazon generates over $ 80,000 in online sale
 Twitter users tweet nearly 300,000 times.
 Instagram users post nearly 220,000 new photos
 Apple users download nearly 50,000 apps.
 Email users send over 2000 million messages
 YouTube users upload 72 hrs of new video content
 Every second
 Banking applications process more than 10,000 credit card
transactions. School of Computer Engineering
Data Challenges
5

To process, analyze and made sense of these different kinds of data, a system is
needed that scales and address the challenges as shown:

“I have data in various sources. I have

“I am flooded with
data that rich in variety – structured,
data”. How to store
semi-structured and unstructured”. How
terabytes of mounting
to work with data that is so very
data?
different?

“I need this data to be

proceed quickly. My
decision is pending”.
How to access the
information quickly?

School of Computer Engineering

Why Hadoop
6

Its capability to handle massive amounts of data, different categories of data –

fairly quickly.
Considerations

School of Computer Engineering

Hadoop History
7

Hadoop was created by Doug Cutting, the creator of Apache Lucene (text search
library). Hadoop was part of Apace Nutch (open-source web search engine of
Yahoo project) and also part of Lucene project. The name Hadoop is not an
acronym; it’s a made-up name.
School of Computer Engineering
Key Aspects of Hadoop
8

School of Computer Engineering

Hadoop Components
9

School of Computer Engineering

Hadoop Components cont’d
10

Hadoop Core Components:

 HDFS
 Storage component
 Distributed data across several nodes
 Natively redundant
 MapReduce
 Computational Framework
 Splits a task across multiple nodes
 Process data in parallel
Hadoop Ecosystems: These are support projects to enhance the functionality of
Hadoop Core components. The projects are as follows:
 Hive  Flume  HBase
 Pig  Oozie
 Sqoop  Mahout

School of Computer Engineering

Hadoop Ecosystem
11

Data Management

Data Access

Data Processing

Data Storage

School of Computer Engineering

Version of Hadoop
12

There are 3 versions of Hadoop available:

 Hadoop 1.x  Hadoop 3.x
 Hadoop 2.x

Hadoop 1.x vs. Hadoop 2.x

Hadoop 1.x Hadoop 2.x

Other Data Processing
MapReduce MapReduce
Framework
Data Processing & Resource
Management YARN
Resource Management
HDFS HDFS2
Distributed File Storage Distributed File Storage
(redundant, reliable storage) (redundant, highly-available, reliable storage)

School of Computer Engineering

Hadoop 2.x vs. Hadoop 3.x
13

Characteristics Hadoop 2.x Hadoop 3.x

Minimum Java 7 Java 8
supported version
of java
Fault tolerance Handled by replication (which is Handled by erasure coding
wastage of space).
Data Balancing Uses HDFS balancer Uses Intra-data node balancer,
which is invoked via the HDFS
disk balancer CLI.
Storage Scheme Uses 3X replication scheme. E.g. If Support for erasure encoding in
there is 6 block so there will be 18 HDFS. E.g. If there is 6 block so
blocks occupied the space because there will be 9 blocks occupied
of the replication scheme. the space 6 block and 3 for parity.
Scalability Scale up to 10,000 nodes per Scale more than 10,000 nodes per
cluster. cluster.

School of Computer Engineering

Hadoop Distributors
14

The top 8 vendors offering Big Data Hadoop solution are:

 Integrated Hadoop Solution
 Cloudera
 HortonWorks
 Amazon Web Services Elastic MapReduce Hadoop Distribution
 Microsoft
 MapR
 IBM InfoSphere Insights
 Cloud-Based Hadoop Solution
 Amazon Web Service
 Google BigQuery

School of Computer Engineering

Hadoop Ecosystem
15

Data Management

Data Access

Data Processing

Data Storage

School of Computer Engineering

Hadoop
16

School of Computer Engineering

Version of Hadoop
17

There are 3 versions of Hadoop available:

 Hadoop 1.x  Hadoop 3.x
 Hadoop 2.x

Hadoop 1.x vs. Hadoop 2.x

Hadoop 1.x Hadoop 2.x

School of Computer Engineering

Hadoop 2.x vs. Hadoop 3.x
18

Characteristics Hadoop 2.x Hadoop 3.x

School of Computer Engineering

High Level Hadoop 2.0 Architecture
19

Hadoop is distributed Master-Slave architecture.

Distributed data storage Distributed data processing
Client

HDFS YARN

HDFS Master Node YARN Master Node

Active Namenode Resource Manager
Master
Standby Namenode
Secondary Namenode

HDFS Slave Node YARN Slave Node

DataNode 1 Slave Node Manager 1

DataNode n Node Manager n

School of Computer Engineering

High Level Hadoop 2.0 Architecture cont’d
20

Resource Node Node Node

YARN Manager Manager Manager Manager

HDFS
Cluster NameNode DataNode DataNode DataNode

School of Computer Engineering

Hadoop HDFS
21

School of Computer Engineering

Hadoop HDFS
22

 The Hadoop Distributed File System (HDFS) is the primary data storage
system used by Hadoop applications.
 HDFS holds very large amount of data and employs a NameNode and
DataNode architecture to implement a distributed file system that provides
high-performance access to data across highly scalable Hadoop clusters.
 To store such huge data, the files are stored across multiple machines.
 These files are stored in redundant fashion to rescue the system from
possible data losses in case of failure.
 It’s run on commodity hardware.
 Unlike other distributed systems, HDFS is highly fault-tolerant and designed
using low-cost hardware.

School of Computer Engineering

Hadoop HDFS Key points
23

Some key points of HDFS are as follows:

1. Storage component of Hadoop.
2. Distributed File System.
3. Modeled after Google File System.
4. Optimized for high throughput (HDFS leverages large block size and moves
computation where data is stored).
5. One can replicate a file for a configured number of times, which is tolerant
in terms of both software and hardware.
6. Re-replicates data blocks automatically on nodes that have failed.
7. Sits on top of native file system

School of Computer Engineering

HDFS Daemons
24

Key components of HDFS are as follows:

1. NameNode 3. Secondary NameNode
2. DataNodes 4. Standby NameNode
Blocks: Generally the user data is stored in the files of HDFS. HDFS breaks a
large file into smaller pieces called blocks. In other words, the minimum
amount of data that HDFS can read or write is called a block. By default the
block size is 128 MB in Hadoop 2.x and 64 MB in Hadoop 1.x. But it can be
increased as per the need to change in HDFS configuration.
Hadoop 2.X Hadoop 1.X
200 MB – abc.txt 200 MB – abc.txt
128 MB – Block 1
72 MB – Block 2 ?
Why block size is large?
1. Reduce the cost of seek time and 2. Proper usage of storage space
School of Computer Engineering
Rack
25

A rack is a collection of 30 or 40 nodes that are physically stored close together

and are all connected to the same network switch. Network bandwidth between
any two nodes in rack is greater than bandwidth between two nodes on different
racks. A Hadoop Cluster is a collection of racks. Switch

Node 1 Node 1 Node 1

S S S
Node 2 Node 2 Node 2
W W W
I I I
T T T
C C C
H H H
Node N Node N Node N

Rack 1 Rack 2 Rack N

School of Computer Engineering
NameNode
26

1. NameNode is the centerpiece of HDFS.

2. NameNode is also known as the Master.
3. NameNode only stores the metadata of HDFS – the directory tree of all files in the
file system, and tracks the files across the cluster.
4. NameNode does not store the actual data or the dataset. The data itself is actually
stored in the DataNodes
5. NameNode knows the list of the blocks and its location for any given file in HDFS.
With this information NameNode knows how to construct the file from blocks.
6. NameNode is usually configured with a lot of memory (RAM).
7. NameNode is so critical to HDFS and when the NameNode is down, HDFS/Hadoop
cluster is inaccessible and considered down.
8. NameNode is a single point of failure in Hadoop cluster.
Configuration
Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 128 GB
Disk: 6 x 1TB SATA
Network: 10 Gigabit Ethernet

School of Computer Engineering

NameNode Metadata
27

1. Metadata stored about the file consists of file name, file path, number of
blocks, block Ids, replication level.
2. This metadata information is stored on the local disk. Namenode uses two
files for storing this metadata information.
 FsImage  EditLog
3. NameNode in HDFS also keeps in it’s memory, location of the DataNodes
that store the blocks for any given file. Using that information Namenode
can reconstruct the whole file by getting the location of all the blocks of a
given file.

Example
(File Name, numReplicas, rack-ids, machine-ids, block-ids, …)
/user/in4072/data/part-0, 3, r:3, M3, {1, 3}, …
/user/in4072/data/part-1, 3, r:2, M1, {2, 4, 5}, …
/user/in4072/data/part-2, 3, r:1, M2, {6, 9, 8}, …

School of Computer Engineering

DataNode
28

1. DataNode is responsible for storing the actual data in HDFS.

2. DataNode is also known as the Slave
3. NameNode and DataNode are in constant communication.
4. When a DataNode starts up it announce itself to the NameNode along with
the list of blocks it is responsible for.
5. When a DataNode is down, it does not affect the availability of data or the
cluster. NameNode will arrange for replication for the blocks managed by
the DataNode that is not available.
6. DataNode is usually configured with a lot of hard disk space. Because the
actual data is stored in the DataNode.
Configuration
Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 64 GB
Disk: 12-24 x 1TB SATA
Network: 10 Gigabit Ethernet

School of Computer Engineering

Secondary NameNode
29

1. Secondary NameNode in Hadoop is more of a helper to NameNode, it is not

a backup NameNode server which can quickly take over in case of
NameNode failure.
2. EditLog– All the file write operations done by client applications are first
recorded in the EditLog.
3. FsImage– This file has the complete information about the file system
metadata when the NameNode starts. All the operations after that are
recorded in EditLog.
4. When the NameNode is restarted it first takes metadata information from
the FsImage and then apply all the transactions recorded in EditLog.
NameNode restart doesn’t happen that frequently so EditLog grows quite
large. That means merging of EditLog to FsImage at the time of startup
takes a lot of time keeping the whole file system offline during that process.
5. Secondary NameNode take over this job of merging FsImage and EditLog
and keep the FsImage current to save a lot of time. Its main function is to
check point the file system metadata stored on NameNode.
School of Computer Engineering
Secondary NameNode cont’d
30

The process followed by Secondary NameNode to periodically merge the

fsimage and the edits log files is as follows:
1. Secondary NameNode pulls the latest FsImage and EditLog files from the
primary NameNode.
2. Secondary NameNode applies each transaction from EditLog file to FsImage
to create a new merged FsImage file.
3. Merged FsImage file is transferred back to primary NameNode.

1
2
Secondary
NameNode
NameNode
3

It’s been an hour,

provide your
metadata

School of Computer Engineering

Standby NameNode
31

With Hadoop 2.0, built into the platform, HDFS now has automated failover
with a hot standby, with full stack resiliency.
1. Automated Failover: Hadoop pro-actively detects NameNode host and
process failures and will automatically switch to the standby NameNode to
maintain availability for the HDFS service. There is no need for human
intervention in the process – System Administrators can sleep in peace!
2. Hot Standby: Both Active and Standby NameNodes have up to date HDFS
metadata, ensuring seamless failover even for large clusters – which means
no downtime for your HDP cluster!
3. Full Stack Resiliency: The entire Hadoop stack (MapReduce, Hive, Pig,
HBase, Oozie etc.) has been certified to handle a NameNode failure scenario
without losing data or the job progress. This is vital to ensure long running
jobs that are critical to complete on schedule will not be adversely affected
during a NameNode failure scenario.

School of Computer Engineering

Replication
32

HDFS provides a reliable way to store huge data in a distributed environment as

data blocks. The blocks are also replicated to provide fault tolerance. The
default replication factor is 3 which is configurable. Therefore, if a file to be
stored of 128 MB in HDFS using the default configuration, it would occupy a
space of 384 MB (3*128 MB) as the blocks will be replicated three times and
each replica will be residing on a different DataNode.

School of Computer Engineering

Rack Awareness
33

All machines in rack are connected using the same network switch and if that
network goes down then all machines in that rack will be out of service. Thus
the rack is down. Rack Awareness was introduced by Apache Hadoop to
overcome this issue. In Rack Awareness, NameNode chooses the DataNode
which is closer to the same rack or nearby rack. NameNode maintains Rack ids
of each DataNode to achieve rack information. Thus, this concept chooses
DataNodes based on the rack information. NameNode in Hadoop makes ensures
that all the replicas should not stored on the same rack or single rack. Default
replication factor is 3. Therefore according to Rack Awareness Algorithm:
 When a Hadoop framework creates new block, it places first replica on the
local node, and place a second one in a different rack, and the third one is
on different node on same remote node.
 When re-replicating a block, if the number of existing replicas is one, place
the second on a different rack.
 When number of existing replicas are two, if the two replicas are in the
same rack, place the third one on a different rack.
School of Computer Engineering
Rack Awareness & Replication
34

File B1 Block 1 B3 Block 3

B1 B2 B3 B2 Block 2

B3 DN 1 B1 DN 1 B2 DN 1

B1 DN 2 B2 DN 2 B3 DN 2

B3 DN 3 B1 DN 3 B2 DN 3

DN 4 DN 4 DN 4

Rack 1 Rack 2 Rack 3

School of Computer Engineering

Rack Awareness Advantages
35

 Provide higher bandwidth and low latency – This policy

maximizes network bandwidth by transferring block within a rack
rather than between racks. The YARN is able to optimize MapReduce
job performance by assigning tasks to nodes that are closer to their
data in terms of network topology.
 Provides data protection against rack failure – Namenode assign
the block replicas of 2nd And 3rd Block to nodes in different rack
from the first replica. Thus, it provides data protection even against
rack failure. However, this is possible only if Hadoop was configured
with knowledge of its rack configuration.
 Minimize the writing cost and Maximize read speed – Rack
awareness, policy places read/write requests to replicas which are in
the same rack. Thus, this minimizes writing cost and maximizes
reading speed.

School of Computer Engineering

Anatomy of File Write
36

HDFS follow Write once Read many models. So files can’t be edited that are
already stored in HDFS, but data can be appended by reopening the file.

te Distributed 2. Creat
HDFS 1. Crea e
3. W File System NameNode
Client rite FSData
6. C
lose OutputStrea
Client JVM m 7. Complete
Client Node
4. Write Packet 5. Acknowledge Packet

4 4
DataNode1 5 DataNode2 5 DataNode3

Pipelines of DataNode
School of Computer Engineering
Anatomy of File Write cont’d
37

1. The client calls create function on DistributedFileSystem (a class extends

from FileSystem) to create a file.
2. The RPC call to the NameNode happens through the DistributedFileSystem
to create a new file. The NameNode performs various checks (existence of
the file) to create a new file. Initially, the NameNode creates a file without
associating any data blocks to the file. The DistributedFileSystem returns
an FSDataOutputStream (i.e. class instance) to the client to perform write.
3. As the client writes data, data is split into packets by DFSOutputStream (i.e.
a class), which is then written to the internal queue, called data queue.
DataStreamer (i.e. a class) consumes the data queue. The DataStreamer
requests the NameNode to allocate new blocks by selecting a list of suitable
DataNodes to store replicas. The list of DataNodes makes a pipeline. With
the default replication factor of 3, there will be 3 nodes in the pipeline for
the first block.

School of Computer Engineering

Anatomy of File Write cont’d
38

4. DataStreamer streams the packets to first DataNode in the pipeline. It

stores packet and forwards it to the second DataNode in the pipeline. In the
sameway, the cecond DataNode stores the packet and forwards to the third
DataNode in the pipeline.
5. In additional to the internal queue, DFSOutputStream also manages an “Ack
queue” of packets that are waiting for the acknowledgement by DataNodes.
A packet is removed from the “Ack Queue” only if it is acknowledged by all
the DataNodes in the pipeline.
6. When the client finishes writing the file, it calls close() on the stream.
7. This flushes all the remaining packets to the DataNode pipeline and waits
for relevant acknowledgements before communicating with the NameNode
to inform the client that the creation of file is complete.

School of Computer Engineering

Anatomy of File Read
39

Distributed 2. Get Block L

HDFS 1. Open ocation
3. R File System NameNode
Client e ad
5. C FSData
lose
Client JVM InputStream

Client Node
4.2. read
4.1. read 4.3. read

DataNode1 DataNode2 DataNode3

DataNodes

School of Computer Engineering

Anatomy of File Write cont’d
40

1. The client opens the file that it wishes to read from by calling open() on the
DistributedFileSystem
2. DistributedFileSystem communicates with the NameNode to get the
location of the data blocks. NameNode returns the address of the
DataNodes that the data blocks are stored on. Subsequent to this,
DistributedFileSystem returns DFSInputStream (i.e. a class) to the client to
read from the file.
3. Client then calls read() on the stream DFSInputStream, which has address
of the DataNodes for the first few blocks of the file, connects to the closet
DataNode for the first block in the file.
4. Client calls read() repeatedly to stream the data from the DataNode.
5. When the end of the block is reached, DFSInputStream closes the
connection with the DataNode. It repeats the steps to find the best
DataNode for the next block and subsequent blocks.
6. When the client completes the reading of the file, it calls close() on
FSDataInputStream to close the connection.
School of Computer Engineering
HDFS Commands
41

 To get the list of directories and files at the root of HDFS.

hadoop fs –ls /
 To create a directory (say, sample) in HDFS
hadoop fs –mkdir /sample
 To copy a file from local file system to HDFS
hadoop fs –put /root/sample/test.txt /sample/test.txt
 To copy a file from HDFS to local file system
hadoop fs –get /sample/test.txt /root/sample/test.txt
 To display the contents of an HDFS file on console
hadoop fs –cat /sample/test.txt
 To copy a file from one directory to another on HDFS
hadoop fs –cp /sample/test.txt /sample1
 To remove a directory from HDFS
hadoop fs –rm –r /sample1

School of Computer Engineering

HDFS Example
42

Let’s assume that this sample.txt file contains few lines as text. The content of the file is
as follows:
Hello I am expert in Big Data
How can I help you
How can I assist you
Are you an engineer
Are you looking for coding
Are you looking for interview questions
what are you doing these days
what are your strengths
Hence, the above 8 lines are the content of the file. Let’s assume that while storing this
file in Hadoop, HDFS broke this file into four parts and named each part as first.txt,
second.txt, third.txt, and fourth.txt. So, you can easily see that the above file will be
divided into four equal parts and each part will contain 2 lines. First two lines will be in
the file first.txt, next two lines in second.txt, next two in third.txt and the last two lines
will be stored in fourth.txt. All these files will be stored in DataNodes and the Name Node
will contain the metadata about them. All this is the task of HDFS.

School of Computer Engineering

Ace The iOS Interview Sample
0% (1)
Ace The iOS Interview Sample
27 pages
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
83% (6)
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
86 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
132 pages
XAMPP Tutorial
100% (1)
XAMPP Tutorial
34 pages
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
No ratings yet
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
260 pages
Developing Enterprise Architects
No ratings yet
Developing Enterprise Architects
12 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Hadoop Ecosystem
100% (2)
Hadoop Ecosystem
33 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
44 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
2nd Unit Bda
No ratings yet
2nd Unit Bda
30 pages
Bigdata Module2 7th-Sem 18cs72
No ratings yet
Bigdata Module2 7th-Sem 18cs72
64 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Identity and Access Management Overview
No ratings yet
Identity and Access Management Overview
37 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
HDFS 79
No ratings yet
HDFS 79
74 pages
INGENIAS Agent Framework: Development Guide Version 1.0
No ratings yet
INGENIAS Agent Framework: Development Guide Version 1.0
51 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
C SDLC
No ratings yet
C SDLC
52 pages
CS101 Introduction To Computing: (Web Development Lecture 9)
No ratings yet
CS101 Introduction To Computing: (Web Development Lecture 9)
51 pages
Complete Linux Administration Course
No ratings yet
Complete Linux Administration Course
3 pages
Hadoop - Project 5th Sem - 1
No ratings yet
Hadoop - Project 5th Sem - 1
62 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
CSC3064 Database Engineering: Entity-Relationship (E-R) Modeling
No ratings yet
CSC3064 Database Engineering: Entity-Relationship (E-R) Modeling
7 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
B.Tech III Year I Semester (R15) Regular Examinations November/December 2017
No ratings yet
B.Tech III Year I Semester (R15) Regular Examinations November/December 2017
4 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
Lesson 1 Introduction To SAP HANA
No ratings yet
Lesson 1 Introduction To SAP HANA
48 pages
WWW - Manaresults.Co - in WWW - Manaresults.Co - In: II B. Tech II Semester Model Question Paper, March - 2018 Java Programming
No ratings yet
WWW - Manaresults.Co - in WWW - Manaresults.Co - In: II B. Tech II Semester Model Question Paper, March - 2018 Java Programming
4 pages
Introduction To DS & ALGO
No ratings yet
Introduction To DS & ALGO
14 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
SQL Server 2005 Tutorial
No ratings yet
SQL Server 2005 Tutorial
48 pages
Agility With SAP Hybris Marketing Cloud APIs and SAP Cloud Platform (SCP)
No ratings yet
Agility With SAP Hybris Marketing Cloud APIs and SAP Cloud Platform (SCP)
10 pages
CLOUD COMPUTING LAB MANUAL V Semester
No ratings yet
CLOUD COMPUTING LAB MANUAL V Semester
63 pages
SFCaseNo 00152432 Installation Instructions
No ratings yet
SFCaseNo 00152432 Installation Instructions
4 pages
Create A Physical File in CL
No ratings yet
Create A Physical File in CL
4 pages
Framework For Big Data Analytics of Mood
No ratings yet
Framework For Big Data Analytics of Mood
10 pages
D 2 Kcomn
No ratings yet
D 2 Kcomn
16 pages
Unit Ii LM
No ratings yet
Unit Ii LM
18 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Education: Erza Diandra Maulana
No ratings yet
Education: Erza Diandra Maulana
2 pages
Big-Data-Unit 4
No ratings yet
Big-Data-Unit 4
99 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
Bsd1313 Chapter 4
No ratings yet
Bsd1313 Chapter 4
129 pages
BDA Final Notes
No ratings yet
BDA Final Notes
53 pages
Bda Unit2
No ratings yet
Bda Unit2
24 pages
DC17 Chp11
No ratings yet
DC17 Chp11
34 pages
Databricks
No ratings yet
Databricks
36 pages
CC Unit 51
No ratings yet
CC Unit 51
39 pages
Module 2
No ratings yet
Module 2
23 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
Chap4 BigDataStorageAndManagement
No ratings yet
Chap4 BigDataStorageAndManagement
46 pages
Module 2 CN
No ratings yet
Module 2 CN
23 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
Unit I
No ratings yet
Unit I
38 pages
Unit 3
No ratings yet
Unit 3
5 pages
Module 4 - Hadoop
No ratings yet
Module 4 - Hadoop
5 pages
BDT Unit03.pptx
No ratings yet
BDT Unit03.pptx
93 pages
Unit 3-1
No ratings yet
Unit 3-1
14 pages
Cyber Security Exam Questions and Answers PDF 1
No ratings yet
Cyber Security Exam Questions and Answers PDF 1
3 pages
Unit-5 - Hadoop
No ratings yet
Unit-5 - Hadoop
29 pages
Backup&Restore and Tally Vault Password
No ratings yet
Backup&Restore and Tally Vault Password
3 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Big Data Aktu Unit 2
No ratings yet
Big Data Aktu Unit 2
127 pages
Introduction To DBMS Introduction To DBMS: Ver. No.: 1.1
No ratings yet
Introduction To DBMS Introduction To DBMS: Ver. No.: 1.1
27 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
UNIT 2 Full
No ratings yet
UNIT 2 Full
121 pages
Unit 2
No ratings yet
Unit 2
73 pages
BDA UNIT-2dhhhhbv
No ratings yet
BDA UNIT-2dhhhhbv
23 pages
BD Unit-02
No ratings yet
BD Unit-02
16 pages
BIG Data - Unit - 2
No ratings yet
BIG Data - Unit - 2
24 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
25 pages
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
No ratings yet
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
30 pages
Unit II Hadoop and Map Reduce Overview
No ratings yet
Unit II Hadoop and Map Reduce Overview
136 pages
Pooja Kaul Walmart, Ex PhonePe
No ratings yet
Pooja Kaul Walmart, Ex PhonePe
1 page
Big Data Unit 2 (Easy Notes) Edushine Classes
No ratings yet
Big Data Unit 2 (Easy Notes) Edushine Classes
35 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
DBMS (2018 2019)
No ratings yet
DBMS (2018 2019)
2 pages
ScyllaDB Report v41
No ratings yet
ScyllaDB Report v41
15 pages
BDA Exp 1
No ratings yet
BDA Exp 1
7 pages
Unit 4 Bda
No ratings yet
Unit 4 Bda
33 pages
Module 2 Big Data Analytics
No ratings yet
Module 2 Big Data Analytics
38 pages
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.