0% found this document useful (0 votes)

9 views18 pages

HDFS

The document provides a comprehensive overview of the Hadoop Distributed File System (HDFS), including its history, key features, architecture, and commands. It highlights the evolution of Hadoop from its inception in 2003 to the introduction of cloud-based services and Apache Spark. Additionally, it compares HDFS with other file systems and outlines the process for scaling out Hadoop clusters.

Uploaded by

nextapai.blog

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views18 pages

HDFS

Uploaded by

nextapai.blog

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Hadoop Distributed File

System
History of Hadoop
Year Event
2003 Google- MapReduce and Google File System (GFS)->Hadoop.
2005 Doug Cutting & Mike Cafarella - Nutch project-> Hadoop.
2006 Standalone project- Doug Cutting's son's toy elephant.
2008 Hadoop joins the Apache Software Foundation
2010 Hadoop 1.0
2011 Hive, Pig, HBase, and ZooKeeper
2012 Hadoop -> big data processing.
2014 Hadoop 2.0 -YARN & improved HDFS

2015 Cloud-based Hadoop services like Amazon EMR, Google Dataproc, and Azure HDInsight

2019 Apache Spark

HDFS
 HDFS stands for Hadoop Distributed File System.
 HDFS is fault-tolerant and designed to be deployed on low-cost,
commodity hardware.
 HDFS provides high throughput and is suitable for applications that
have large data sets and enables streaming access.
HDFS vs. Other File Systems
Feature Database NTFS EXT4 APFS HDFS

Manages Distributed
General-purpose General-purpose Optimized for
Purpose structured data storage for big
file system file system Apple devices
with queries data

Applications
Primary Use Windows macOS/iOS Big data
needing Linux storage
Case storage storage analytics
tables/queries

Depends on DB 16 EB No fixed limit

File Size Limit 16 TB 8 EB
engine/schema (theoretical) (block-based)

Depends on DB Scales across

Partition Size 256 TB 1 EB 8 EB
engine multiple nodes
Key Features of HDFS
 Distributed Storage
 Fault Tolerance
 High Throughput
 Scalability
 Write-Once, Read-Many
 Large File Support
Key Components of hdfs
 Data node
 Name node
 Secondary Name Node
 Client
 Back up node
 Replication management
 Rack awareness
 Read & write operation
Key Differences
Feature Secondary NameNode Backup Node Standby Node
Maintaining an in-memory
Checkpointing and merging edit High Availability (HA) -
Primary Role replica of the NameNode's
logs with fsimage. provides automatic failover.
metadata.
Holds a replica of the
Merges edit logs and fsimage to Synchronized with the active
Data Handling NameNode's in-memory
reduce log size. NameNode for failover.
metadata.
Does not automatically handle
Does not provide failover Provides automatic failover in
Failover Capability failover; requires manual
capability. case of NameNode failure.
promotion.
Ensures NameNode availability
Reduces NameNode recovery Can be promoted to NameNode
Use Case by switching roles in case of
time and prevents large logs. if the NameNode fails.
failure.

Can take over from the

Periodically merges logs and Continuously synchronized and
Interaction with NameNode NameNode in case of failure,
fsimage to reduce load. takes over automatically.
but manually promoted.
HDFS Architecture
Scaling Out in Hadoop
 Scaling out refers to adding more nodes to a cluster to increase its
capacity for handling larger datasets and processing workloads.
 This is in contrast to scaling up, which involves upgrading the existing
hardware with more powerful components (e.g., more CPU, memory,
or storage).
Steps for Scaling Out
 Add New Nodes to the Cluster
 Install Hadoop on New Nodes
 Update Configuration Files
 Start Hadoop Services
 Rebalance Data Across Nodes
 Monitor the Cluster
Hadoop Streaming
 A Hadoop-specific utility that allows users to write MapReduce programs in any
language that can read from stdin and write to stdout.
 It is not for real-time streaming; it operates on batch processing of large datasets
stored in HDFS.
 Allows users to process data in parallel using the MapReduce framework.

Key Features:
 Executes scripts in non-Java languages for batch processing.
 Part of Hadoop's MapReduce ecosystem.
HDFS COMMANDS

mkdir ls get

put cat
HDFS COMMANDS
mkdir:
hdfs dfs -mkdir /path/to/directory
hdfs dfs -mkdir /user/hadoop/data cat:
hdfs dfs -cat /path/to/file
hdfs dfs -cat /user/hadoop/data/data.txt
ls:
hdfs dfs -ls /path/to/directory get:
hdfs dfs -ls /user/Hadoop hdfs dfs -get /hdfs/path/file /local/path
hdfs dfs -get /user/hadoop/data/data.txt
put: /home/user
hdfs dfs -put /local/path/file /hdfs/path
hdfs dfs -put /home/user/data.txt
/user/hadoop/data
HDFS COMMANDS

cp mv rm

chown touchz
HDFS COMMANDS
 cp:
hdfs dfs -cp /source/path /destination/path
 chown:
hdfs dfs -cp /user/hadoop/data.txt
/user/hadoop/backup/data.txt hdfs dfs -chown [user]:[group]
/path/to/file_or_directory

 mv: hdfs dfs -chown hadoop:supergroup

hdfs dfs -mv /source/path /destination/path /user/hadoop/data.txt
hdfs dfs -mv /user/hadoop/data.txt
 touchz:
/user/hadoop/old_data.txt
hdfs dfs -touchz /path/to/file

 rm: hdfs dfs -touchz

hdfs dfs -rm /path/to/file /user/hadoop/empty_file.txt
hdfs dfs -rm -r /path/to/directory
HDFS COMMANDS

du df setrep

clear stat
HDFS COMMANDS
 du:  setrep:
hdfs dfs -du [options] /path [s,h] hdfs dfs -setrep -w [replication_factor] /path
hdfs dfs -setrep -w 3
hdfs dfs -du /user/hadoop/project
/user/hadoop/project/data.txt

 df:  stat:
hdfs dfs -df [path] hdfs dfs -stat [format] /path
hdfs dfs -df /
Filesystem Size Used Available Use%
hdfs://localhost:9000 1000GB 400GB 600GB 40%

 clear:
hdfs dfs -stat %n %b %r %y /user/hadoop/project/data.txt
clear
Thank you

Seat Leon (1P, 1P0,1P1) Workshop - Electrical System
67% (3)
Seat Leon (1P, 1P0,1P1) Workshop - Electrical System
365 pages
Ata 36
100% (1)
Ata 36
113 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
Unit II Hadoop and Map Reduce Overview
No ratings yet
Unit II Hadoop and Map Reduce Overview
136 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Apache Hadoop: Getting Started With
No ratings yet
Apache Hadoop: Getting Started With
7 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
10th August Morning and Afternoon Session Hadoop
No ratings yet
10th August Morning and Afternoon Session Hadoop
18 pages
Big Data Unit - 2
No ratings yet
Big Data Unit - 2
18 pages
Hadoop Nishant Gandhi.
No ratings yet
Hadoop Nishant Gandhi.
21 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Lecture 4 Introduction To Hadoop
No ratings yet
Lecture 4 Introduction To Hadoop
25 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Unit 5
No ratings yet
Unit 5
101 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Hadoop Intro and Hdfs
No ratings yet
Hadoop Intro and Hdfs
37 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
BDT - Unit - II - Hdfs and Hadoop Io
No ratings yet
BDT - Unit - II - Hdfs and Hadoop Io
42 pages
Understanding Hadoop Ecosystem
No ratings yet
Understanding Hadoop Ecosystem
38 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
BigData Unit2
No ratings yet
BigData Unit2
80 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
Hadoop Platform & Services
No ratings yet
Hadoop Platform & Services
41 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Module 4 - Hadoop
No ratings yet
Module 4 - Hadoop
5 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Unit - 2
No ratings yet
Unit - 2
42 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
5.apache Hadoop Updated
No ratings yet
5.apache Hadoop Updated
57 pages
Hadoop: A Software Framework For Data Intensive Computing Applications
No ratings yet
Hadoop: A Software Framework For Data Intensive Computing Applications
47 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Module 2 Big Data Analytics
No ratings yet
Module 2 Big Data Analytics
38 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Unit 3
No ratings yet
Unit 3
5 pages
Unit 2
No ratings yet
Unit 2
73 pages
Bda-Unit-2 - 2023
No ratings yet
Bda-Unit-2 - 2023
58 pages
Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
Big Data Aktu Unit 2
No ratings yet
Big Data Aktu Unit 2
127 pages
Bda Unit2
No ratings yet
Bda Unit2
24 pages
4
No ratings yet
4
53 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
BDA Exp 1
No ratings yet
BDA Exp 1
7 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
67 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Another Intro To Hadoop
No ratings yet
Another Intro To Hadoop
23 pages
Unit 3 Mapreduce
No ratings yet
Unit 3 Mapreduce
14 pages
6 - HDFS
No ratings yet
6 - HDFS
37 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Hadoop
No ratings yet
Hadoop
71 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
G2 - Imrad Hbo
No ratings yet
G2 - Imrad Hbo
19 pages
Lewatit Monoplus S 108 H
No ratings yet
Lewatit Monoplus S 108 H
5 pages
Resume 1
No ratings yet
Resume 1
1 page
2016 CCNY Great Grads
No ratings yet
2016 CCNY Great Grads
16 pages
State Space Control of Systems Tutorial
No ratings yet
State Space Control of Systems Tutorial
15 pages
4K电影合集 - 副本
No ratings yet
4K电影合集 - 副本
19 pages
2022-23 Eco Ch-1 Assignment (Development)
No ratings yet
2022-23 Eco Ch-1 Assignment (Development)
4 pages
Alliance Management
No ratings yet
Alliance Management
5 pages
Conversion
No ratings yet
Conversion
1 page
Breeds of Cattle
No ratings yet
Breeds of Cattle
18 pages
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
No ratings yet
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
1 page
Industrial Disputes Act
No ratings yet
Industrial Disputes Act
2 pages
Soil Mechanics and Foundation MCQ
No ratings yet
Soil Mechanics and Foundation MCQ
15 pages
Python Programming Notes - UNIT-I
No ratings yet
Python Programming Notes - UNIT-I
69 pages
Lea 2 - Comparative Police System (Lesson 1)
No ratings yet
Lea 2 - Comparative Police System (Lesson 1)
8 pages
QA For Bank
No ratings yet
QA For Bank
443 pages
Corporation Testbank
No ratings yet
Corporation Testbank
45 pages
Marine Spread Specification
No ratings yet
Marine Spread Specification
38 pages
PMFIAS Prelims Magnum 2025 06 Science and Technology
No ratings yet
PMFIAS Prelims Magnum 2025 06 Science and Technology
210 pages
Quiz No. 2
No ratings yet
Quiz No. 2
1 page
Oiv Ma As1 12
No ratings yet
Oiv Ma As1 12
92 pages
Mata Kuliah Pengantar Ilmu Ekonomi & Bisnis: Disusun Oleh
No ratings yet
Mata Kuliah Pengantar Ilmu Ekonomi & Bisnis: Disusun Oleh
3 pages
The Meaning of Foreign Exchange
No ratings yet
The Meaning of Foreign Exchange
5 pages
9-ch3 Part3 ch5 Part1
No ratings yet
9-ch3 Part3 ch5 Part1
24 pages
SSC GR 10 Electronics Q4 Module 1 WK 1 - v.01-CC-released-22May2021
No ratings yet
SSC GR 10 Electronics Q4 Module 1 WK 1 - v.01-CC-released-22May2021
20 pages
LUVOBATCH Blowingagents EN 2022
No ratings yet
LUVOBATCH Blowingagents EN 2022
7 pages
Grade 10 - Unit 01
No ratings yet
Grade 10 - Unit 01
2 pages
Stock Trading
No ratings yet
Stock Trading
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

HDFS

Uploaded by

HDFS

Uploaded by

Hadoop Distributed File

2019 Apache Spark

Depends on DB 16 EB No fixed limit

Depends on DB Scales across

Can take over from the

 mv: hdfs dfs -chown hadoop:supergroup

 rm: hdfs dfs -touchz

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.