0% found this document useful (0 votes)

6 views

Read and Write Operation

The document outlines the read and write operations in Hadoop's HDFS, detailing how clients interact with the NameNode for metadata and DataNodes for data storage. It explains the process of writing files by sending blocks to DataNodes and reading files by fetching data directly from DataNodes after obtaining their locations from the NameNode. Additionally, it provides an overview of HDFS architecture, including the roles of NameNode and DataNodes, and includes sample Java code for performing read and write operations.

Uploaded by

prasannadwivedi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Read and Write Operation

Uploaded by

prasannadwivedi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

HDFS Read and Write Operation

1. Write Operation 2. Read Operation

When a client wants to write a file to HDFS, it To read from HDFS, the client first communicates
communicates to the NameNode for metadata. with the NameNode for metadata. The Namenode
The Namenode responds with a number of responds with the locations of DataNodes
blocks, their location, replicas, and other containing blocks. After receiving the DataNodes
details. Based on information from NameNode, locations, the client then directly interacts with the
the client directly interacts with the DataNode. DataNodes.
The client first sends block A to DataNode 1 The client starts reading data parallelly from the
along with the IP of the other two DataNodes DataNodes based on the information received
where replicas will be stored. When Datanode 1 from the NameNode. The data will flow directly
receives block A from the client, DataNode 1 from the DataNode to the client.
copies the same block to DataNode 2 of the When a client or application receives all the blocks
same rack. As both the DataNodes are in the of the file, it combines these blocks into the form
same rack, so block transfer via rack switch. of an original file.
Now DataNode 2 copies the same block to Go through the HDFS read and write
DataNode 4 on a different rack. As both the operation article to study how the client can read
DataNoNes are in different racks, so block and write files in Hadoop HDFS.
transfer via an out-of-rack switch.
When DataNode receives the blocks from the
client, it sends write confirmation to
Namenode.
The same process is repeated for each block of
Pawan Kumar Singh, AP, Deptt.f Cse 1
the file.
To read the files stored in HDFS, the HDFS HDFS read operation
client interacts with the NameNode and Suppose the HDFS client wants to read a file
DataNode. “File.txt”. Let the file be divided into two
Before beginning with the HDFS read operation, blocks say, A and B. The following steps will
let’s have a short introduction to the following take place during the file read:
components:
1.HDFS Client: On user behalf, HDFS client
interacts with NameNode and Datanode to
fulfill user requests.
2.NameNode: NameNode is the master node that
stores metadata about block locations, blocks of
a file, etc. This metadata is used for file read and
write operation.
3.DataNode: DataNodes are the slave nodes in
HDFS. They store actual data (data blocks).

Pawan Kumar Singh, AP, Deptt.f Cse 2

1.The Client interacts with HDFS NameNode 2. The client interacts with HDFS
•As the NameNode stores the block’s metadata for DataNode
the file “File.txt’, the client will reach out to •After receiving the addresses of the
NameNode asking locations of DataNodes DataNodes, the client directly interacts with
containing data blocks. the DataNodes. The client will send a request
•The NameNode first checks for required privileges, to the closest DataNodes (D2 for block A and
and if the client has sufficient privileges, the D3 for block B) through
NameNode sends the locations of DataNodes the FSDataInputstream object.
containing blocks (A and B). NameNode also gives The DFSInputstream manages the interaction
a security token to the client, which they need to between client and DataNode.
show to the DataNodes for authentication. Let the •The client will show the security tokens
NameNode provide the following list of IPs for block provided by NameNode to the DataNodes
A and B – for block A, location of DataNodes D2, and start reading data from the DataNode.
D5, D7, and for block B, location of DataNodes D3, The data will flow directly from the
D9, D11. DataNode to the client.
To perform various HDFS operations (read, write, •After reading all the required file blocks, the
copy, move, change permission, etc.) client calls close() method on
the FSDataInputStream object.

Pawan Kumar Singh, AP, Deptt.f Cse 3

Internals of file read in HDFS
How to Read a file from HDFS – Java
Program
A sample code to read a file from HDFS is as
follows (To perform HDFS read and write
operations:
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path("/path/to/file.ext");
if (!fileSystem.exists(path)) {
System.out.println("File does not exists");
return;
}
FSDataInputStream in = fileSystem.open(path);
int numBytes = 0;
while ((numBytes = in.read(b))> 0) {
System.out.prinln((char)numBytes));// code to
manipulate the data which is read
}
in.close();
out.close();
fileSystem.close();
Pawan Kumar Singh, AP, Deptt.f Cse 4
HDFS write operation

Introduction to HDFS
• HDFS is the distributed file system
in Hadoop for storing huge volumes
and variety of data. HDFS follows the
master-slave architecture where the
NameNode is the master node, and
DataNodes are the slave nodes. The
files in HDFS are broken into data
blocks. The NameNode stores the
metadata about the blocks, and
DataNodes stores the data blocks.

Pawan Kumar Singh, AP, Deptt.f Cse 5

HDFS Nodes
1. HDFS Master (Namenode)
As we know, Hadoop works in master-
slave fashion, HDFS also has two types of nodes NameNode regulates file access to the clients. It
that work in the same manner. These are maintains and manages the slave nodes and assigns
the NameNode(s) and the DataNodes tasks to them. NameNode executes file system
namespace operations like opening, closing, and
renaming files and directories.
NameNode runs on the high configuration
hardware.
2. HDFS Slave (Datanode)
There are n number of slaves (where n can be up to
1000) or DataNodes in the Hadoop Distributed File
System that manages storage of data. These slave
nodes are the actual worker nodes that do the tasks
and serve read and write requests from the file
system’s clients.
They perform block creation, deletion, and
replication upon instruction from the NameNode.
Once a block is written on a DataNode, it replicates
it to other DataNode, and the process continues
until creating the required number of replicas.
Pawan Kumar Singh, AP, Deptt.f Cse 6
Hadoop HDFS Daemons How Hadoop MapReduce Works?
There are two daemons which run on HDFS for data
storage:
•Namenode: This is the daemon that runs on all the
masters. NameNode stores metadata like filename, the
number of blocks, number of replicas, a location of blocks,
block IDs, etc.
This metadata is available in memory in the master for
faster retrieval of data. In the local disk, a copy of the
metadata is available for persistence. So NameNode
memory should be high as per the requirement.
•Datanode: This is the daemon that runs on the slave.
These are actual worker nodes that store the data.

Pawan Kumar Singh, AP, Deptt.f Cse 7

How to Write a file in HDFS – Java Program
Internals of file write in Hadoop HDFS A sample code to write a file to HDFS in Java is as
Let us understand the HDFS write operation in detail. The follows:
following steps will take place while writing a file to the FileSystem fileSystem = FileSystem.get(conf);
HDFS: // Check if the file already exists
Path path = new Path("/path/to/file.ext");
if (fileSystem.exists(path)) {
System.out.println("File " + dest + " already exists");
return;
}
// Create a new file and write data to it.
FSDataOutputStream out = fileSystem.create(path);
InputStream in = new BufferedInputStream(new
FileInputStream(
new File(source)));
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
out.write(b, 0, numBytes);
}
// Close all the file descripters
in.close();
out.close();
Pawan Kumar Singh, AP, Deptt.f CsefileSystem.close(); 8
Pawan Kumar Singh, AP, Deptt.f Cse 9
Overview Of HDFS Architecture
In Hadoop HDFS, NameNode is the
master node and DataNodes are the
slave nodes. The file in HDFS is stored
as data blocks.
The file is divided into blocks (A, B, C in
the below GIF). These blocks get stored
on different DataNodes based on the
Rack Awareness Algorithm. Block A on
DataNode-1(DN-1), block B on
DataNode-6(DN-6), and block C on
DataNode-7(DN-7).
To provide Fault Tolerance, replicas of
blocks are created based on the
replication factor.
In the below GIF, 2 replicas of each
block is created (using default
replication factor 3). Replicas were
placed on different DataNodes, thus
ensuring data availability even in the
case of DataNode failure or rack failure.

Pawan Kumar Singh, AP, Deptt.f Cse 10

Medical Certificate
100% (3)
Medical Certificate
2 pages
Data Flow in Hdfs
No ratings yet
Data Flow in Hdfs
7 pages
Unit 4
No ratings yet
Unit 4
104 pages
HDFS Tutorial - Architecture, Read & Write Operation Using Java API
No ratings yet
HDFS Tutorial - Architecture, Read & Write Operation Using Java API
3 pages
bigdata (2)
No ratings yet
bigdata (2)
5 pages
Hadoop Working
No ratings yet
Hadoop Working
33 pages
Unit 3 Part 1
No ratings yet
Unit 3 Part 1
17 pages
Unit_3_Big Data
No ratings yet
Unit_3_Big Data
66 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
1.HDFS Architecture and Its Operations
No ratings yet
1.HDFS Architecture and Its Operations
6 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
HDFS
No ratings yet
HDFS
16 pages
Bda Unit 5
No ratings yet
Bda Unit 5
17 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Big Data Assignment PDF
No ratings yet
Big Data Assignment PDF
18 pages
UNIT-5-HDFS (Hadoop Distributed File System)
No ratings yet
UNIT-5-HDFS (Hadoop Distributed File System)
18 pages
Complete Hadoop Notes Final
No ratings yet
Complete Hadoop Notes Final
4 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
BIGDTA_UNIT_3
No ratings yet
BIGDTA_UNIT_3
65 pages
Hdfs Architecture
No ratings yet
Hdfs Architecture
16 pages
Unit II Big Data Analytics
No ratings yet
Unit II Big Data Analytics
11 pages
UNIT-3-1 (1)
No ratings yet
UNIT-3-1 (1)
20 pages
Big Data Unit-3 PPT
No ratings yet
Big Data Unit-3 PPT
46 pages
HDFS
No ratings yet
HDFS
3 pages
HDFS
No ratings yet
HDFS
11 pages
Lecture 4 - Hadoop HDFS
No ratings yet
Lecture 4 - Hadoop HDFS
48 pages
05 - Introduction To HDFS
No ratings yet
05 - Introduction To HDFS
27 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
5 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
HDFS.pptx (1)
No ratings yet
HDFS.pptx (1)
20 pages
huawei
No ratings yet
huawei
32 pages
Unit-2_ch_1_updated
No ratings yet
Unit-2_ch_1_updated
22 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Unit- 3 (HDFS)
No ratings yet
Unit- 3 (HDFS)
23 pages
Unit- 3 (HDFS)-1
No ratings yet
Unit- 3 (HDFS)-1
24 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
HADOOP FILE SYSTEM
No ratings yet
HADOOP FILE SYSTEM
5 pages
BDS Session 5
No ratings yet
BDS Session 5
57 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Apex Institute of Technology: Big Data Security
No ratings yet
Apex Institute of Technology: Big Data Security
30 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
CC Unit 5 Notes
No ratings yet
CC Unit 5 Notes
30 pages
BigData Fundamental and Hadoop Interview Questions
No ratings yet
BigData Fundamental and Hadoop Interview Questions
33 pages
Hdfs and Pig
No ratings yet
Hdfs and Pig
13 pages
Hadoop: OREIN IT Technologies
No ratings yet
Hadoop: OREIN IT Technologies
65 pages
BDA-Unit 4
No ratings yet
BDA-Unit 4
20 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
Unit III
No ratings yet
Unit III
86 pages
Big Data Hadoop HDFS
No ratings yet
Big Data Hadoop HDFS
32 pages
HDFS
No ratings yet
HDFS
14 pages
HDFS Concepts
No ratings yet
HDFS Concepts
10 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
248 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
T5300 Titan Superflex White
No ratings yet
T5300 Titan Superflex White
8 pages
Lab 05 and 06 Shell Scripting
No ratings yet
Lab 05 and 06 Shell Scripting
15 pages
Reviewer in Basic Accounting 2 Key
No ratings yet
Reviewer in Basic Accounting 2 Key
6 pages
The Work Ethic: January 1974
No ratings yet
The Work Ethic: January 1974
12 pages
Thesis Markoulakis-Antonia PDF
No ratings yet
Thesis Markoulakis-Antonia PDF
109 pages
How To Secure Your Website
100% (1)
How To Secure Your Website
5 pages
Simply Idli Final - G8
No ratings yet
Simply Idli Final - G8
21 pages
(NEW) Jollibee
No ratings yet
(NEW) Jollibee
13 pages
Expanded CODA Rules
100% (5)
Expanded CODA Rules
22 pages
Calvert Cliff
No ratings yet
Calvert Cliff
11 pages
Tanauan City College: Grading Sheet
No ratings yet
Tanauan City College: Grading Sheet
33 pages
ANSI ASABE AD500-1 2004 W - Cor.1 (OCT2011)
No ratings yet
ANSI ASABE AD500-1 2004 W - Cor.1 (OCT2011)
7 pages
THC Swissport Cargo Services NL B.v. 2022
No ratings yet
THC Swissport Cargo Services NL B.v. 2022
10 pages
Will - Virendra Singh Lohia
No ratings yet
Will - Virendra Singh Lohia
15 pages
Rumble Racing (NTSC-U).pnach
No ratings yet
Rumble Racing (NTSC-U).pnach
12 pages
Labor Standards 2020 Employer-Employee Relationship Atty. Paciano F. Fallar Jr. SSCR-CL
No ratings yet
Labor Standards 2020 Employer-Employee Relationship Atty. Paciano F. Fallar Jr. SSCR-CL
4 pages
Controlling The Defects of Paint Shop PDF
No ratings yet
Controlling The Defects of Paint Shop PDF
4 pages
People v. Sandiganbayan (2015)
No ratings yet
People v. Sandiganbayan (2015)
2 pages
Challenges Faced by Multinational Companies The Case of Castel Winery Company in Ethiopia
No ratings yet
Challenges Faced by Multinational Companies The Case of Castel Winery Company in Ethiopia
33 pages
IEC Tutorial: Help New Cable
No ratings yet
IEC Tutorial: Help New Cable
1 page
Envi Digest Feb 20
No ratings yet
Envi Digest Feb 20
14 pages
ECall Letter
No ratings yet
ECall Letter
2 pages
Check in Magazine Mobile Check in
No ratings yet
Check in Magazine Mobile Check in
2 pages
Memory Management in Android
No ratings yet
Memory Management in Android
4 pages
Dec 2nd Week Details (Eng) by AC
No ratings yet
Dec 2nd Week Details (Eng) by AC
27 pages
Doorman Resume
100% (2)
Doorman Resume
7 pages
dsp
No ratings yet
dsp
2 pages
Lodhi Et Al-2018-The Obstetrician & Gynaecologist
No ratings yet
Lodhi Et Al-2018-The Obstetrician & Gynaecologist
3 pages
McGill Contracts Roadmap
No ratings yet
McGill Contracts Roadmap
42 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Read and Write Operation

Uploaded by

Read and Write Operation

Uploaded by

HDFS Read and Write Operation

1. Write Operation 2. Read Operation

Pawan Kumar Singh, AP, Deptt.f Cse 2

Pawan Kumar Singh, AP, Deptt.f Cse 3

Pawan Kumar Singh, AP, Deptt.f Cse 5

Pawan Kumar Singh, AP, Deptt.f Cse 7

Pawan Kumar Singh, AP, Deptt.f Cse 10

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.