0% found this document useful (0 votes)
6 views

Read and Write Operation

The document outlines the read and write operations in Hadoop's HDFS, detailing how clients interact with the NameNode for metadata and DataNodes for data storage. It explains the process of writing files by sending blocks to DataNodes and reading files by fetching data directly from DataNodes after obtaining their locations from the NameNode. Additionally, it provides an overview of HDFS architecture, including the roles of NameNode and DataNodes, and includes sample Java code for performing read and write operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Read and Write Operation

The document outlines the read and write operations in Hadoop's HDFS, detailing how clients interact with the NameNode for metadata and DataNodes for data storage. It explains the process of writing files by sending blocks to DataNodes and reading files by fetching data directly from DataNodes after obtaining their locations from the NameNode. Additionally, it provides an overview of HDFS architecture, including the roles of NameNode and DataNodes, and includes sample Java code for performing read and write operations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

HDFS Read and Write Operation

1. Write Operation 2. Read Operation


When a client wants to write a file to HDFS, it To read from HDFS, the client first communicates
communicates to the NameNode for metadata. with the NameNode for metadata. The Namenode
The Namenode responds with a number of responds with the locations of DataNodes
blocks, their location, replicas, and other containing blocks. After receiving the DataNodes
details. Based on information from NameNode, locations, the client then directly interacts with the
the client directly interacts with the DataNode. DataNodes.
The client first sends block A to DataNode 1 The client starts reading data parallelly from the
along with the IP of the other two DataNodes DataNodes based on the information received
where replicas will be stored. When Datanode 1 from the NameNode. The data will flow directly
receives block A from the client, DataNode 1 from the DataNode to the client.
copies the same block to DataNode 2 of the When a client or application receives all the blocks
same rack. As both the DataNodes are in the of the file, it combines these blocks into the form
same rack, so block transfer via rack switch. of an original file.
Now DataNode 2 copies the same block to Go through the HDFS read and write
DataNode 4 on a different rack. As both the operation article to study how the client can read
DataNoNes are in different racks, so block and write files in Hadoop HDFS.
transfer via an out-of-rack switch.
When DataNode receives the blocks from the
client, it sends write confirmation to
Namenode.
The same process is repeated for each block of
Pawan Kumar Singh, AP, Deptt.f Cse 1
the file.
To read the files stored in HDFS, the HDFS HDFS read operation
client interacts with the NameNode and Suppose the HDFS client wants to read a file
DataNode. “File.txt”. Let the file be divided into two
Before beginning with the HDFS read operation, blocks say, A and B. The following steps will
let’s have a short introduction to the following take place during the file read:
components:
1.HDFS Client: On user behalf, HDFS client
interacts with NameNode and Datanode to
fulfill user requests.
2.NameNode: NameNode is the master node that
stores metadata about block locations, blocks of
a file, etc. This metadata is used for file read and
write operation.
3.DataNode: DataNodes are the slave nodes in
HDFS. They store actual data (data blocks).

Pawan Kumar Singh, AP, Deptt.f Cse 2


1.The Client interacts with HDFS NameNode 2. The client interacts with HDFS
•As the NameNode stores the block’s metadata for DataNode
the file “File.txt’, the client will reach out to •After receiving the addresses of the
NameNode asking locations of DataNodes DataNodes, the client directly interacts with
containing data blocks. the DataNodes. The client will send a request
•The NameNode first checks for required privileges, to the closest DataNodes (D2 for block A and
and if the client has sufficient privileges, the D3 for block B) through
NameNode sends the locations of DataNodes the FSDataInputstream object.
containing blocks (A and B). NameNode also gives The DFSInputstream manages the interaction
a security token to the client, which they need to between client and DataNode.
show to the DataNodes for authentication. Let the •The client will show the security tokens
NameNode provide the following list of IPs for block provided by NameNode to the DataNodes
A and B – for block A, location of DataNodes D2, and start reading data from the DataNode.
D5, D7, and for block B, location of DataNodes D3, The data will flow directly from the
D9, D11. DataNode to the client.
To perform various HDFS operations (read, write, •After reading all the required file blocks, the
copy, move, change permission, etc.) client calls close() method on
the FSDataInputStream object.

Pawan Kumar Singh, AP, Deptt.f Cse 3


Internals of file read in HDFS
How to Read a file from HDFS – Java
Program
A sample code to read a file from HDFS is as
follows (To perform HDFS read and write
operations:
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path("/path/to/file.ext");
if (!fileSystem.exists(path)) {
System.out.println("File does not exists");
return;
}
FSDataInputStream in = fileSystem.open(path);
int numBytes = 0;
while ((numBytes = in.read(b))> 0) {
System.out.prinln((char)numBytes));// code to
manipulate the data which is read
}
in.close();
out.close();
fileSystem.close();
Pawan Kumar Singh, AP, Deptt.f Cse 4
HDFS write operation

Introduction to HDFS
• HDFS is the distributed file system
in Hadoop for storing huge volumes
and variety of data. HDFS follows the
master-slave architecture where the
NameNode is the master node, and
DataNodes are the slave nodes. The
files in HDFS are broken into data
blocks. The NameNode stores the
metadata about the blocks, and
DataNodes stores the data blocks.

Pawan Kumar Singh, AP, Deptt.f Cse 5


HDFS Nodes
1. HDFS Master (Namenode)
As we know, Hadoop works in master-
slave fashion, HDFS also has two types of nodes NameNode regulates file access to the clients. It
that work in the same manner. These are maintains and manages the slave nodes and assigns
the NameNode(s) and the DataNodes tasks to them. NameNode executes file system
namespace operations like opening, closing, and
renaming files and directories.
NameNode runs on the high configuration
hardware.
2. HDFS Slave (Datanode)
There are n number of slaves (where n can be up to
1000) or DataNodes in the Hadoop Distributed File
System that manages storage of data. These slave
nodes are the actual worker nodes that do the tasks
and serve read and write requests from the file
system’s clients.
They perform block creation, deletion, and
replication upon instruction from the NameNode.
Once a block is written on a DataNode, it replicates
it to other DataNode, and the process continues
until creating the required number of replicas.
Pawan Kumar Singh, AP, Deptt.f Cse 6
Hadoop HDFS Daemons How Hadoop MapReduce Works?
There are two daemons which run on HDFS for data
storage:
•Namenode: This is the daemon that runs on all the
masters. NameNode stores metadata like filename, the
number of blocks, number of replicas, a location of blocks,
block IDs, etc.
This metadata is available in memory in the master for
faster retrieval of data. In the local disk, a copy of the
metadata is available for persistence. So NameNode
memory should be high as per the requirement.
•Datanode: This is the daemon that runs on the slave.
These are actual worker nodes that store the data.

Pawan Kumar Singh, AP, Deptt.f Cse 7


How to Write a file in HDFS – Java Program
Internals of file write in Hadoop HDFS A sample code to write a file to HDFS in Java is as
Let us understand the HDFS write operation in detail. The follows:
following steps will take place while writing a file to the FileSystem fileSystem = FileSystem.get(conf);
HDFS: // Check if the file already exists
Path path = new Path("/path/to/file.ext");
if (fileSystem.exists(path)) {
System.out.println("File " + dest + " already exists");
return;
}
// Create a new file and write data to it.
FSDataOutputStream out = fileSystem.create(path);
InputStream in = new BufferedInputStream(new
FileInputStream(
new File(source)));
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
out.write(b, 0, numBytes);
}
// Close all the file descripters
in.close();
out.close();
Pawan Kumar Singh, AP, Deptt.f CsefileSystem.close(); 8
Pawan Kumar Singh, AP, Deptt.f Cse 9
Overview Of HDFS Architecture
In Hadoop HDFS, NameNode is the
master node and DataNodes are the
slave nodes. The file in HDFS is stored
as data blocks.
The file is divided into blocks (A, B, C in
the below GIF). These blocks get stored
on different DataNodes based on the
Rack Awareness Algorithm. Block A on
DataNode-1(DN-1), block B on
DataNode-6(DN-6), and block C on
DataNode-7(DN-7).
To provide Fault Tolerance, replicas of
blocks are created based on the
replication factor.
In the below GIF, 2 replicas of each
block is created (using default
replication factor 3). Replicas were
placed on different DataNodes, thus
ensuring data availability even in the
case of DataNode failure or rack failure.

Pawan Kumar Singh, AP, Deptt.f Cse 10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy