0% found this document useful (0 votes)
29 views1 page

The Communication Protocols

The document discusses how the DataNode stores HDFS data in separate files across local directories and sends a Blockreport to the NameNode on startup. It also describes how HDFS uses TCP/IP and RPC for communication between clients, DataNodes and the NameNode.

Uploaded by

Chris Harris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views1 page

The Communication Protocols

The document discusses how the DataNode stores HDFS data in separate files across local directories and sends a Blockreport to the NameNode on startup. It also describes how HDFS uses TCP/IP and RPC for communication between clients, DataNodes and the NameNode.

Uploaded by

Chris Harris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

The DataNode stores HDFS data in files in its local file system.

The DataNode has no knowledge about


HDFS files. It stores each block of HDFS data in a separate file in its local file system. The DataNode does
not create all files in the same directory. Instead, it uses a heuristic to determine the optimal number of
files per directory and creates subdirectories appropriately. It is not optimal to create all local files in the
same directory because the local file system might not be able to efficiently support a huge number of
files in a single directory. When a DataNode starts up, it scans through its local file system, generates a
list of all HDFS data blocks that correspond to each of these local files, and sends this report to the
NameNode. The report is called the Blockreport.

The Communication Protocols

All HDFS communication protocols are layered on top of the TCP/IP protocol. A client establishes a
connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the
NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure
Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol. By design, the
NameNode never initiates any RPCs. Instead, it only responds to RPC requests issued by DataNodes or
clients.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy