Skip to content

Display hdfs directory/file in tree structure, thoughts inspired by Linux tree command

Notifications You must be signed in to change notification settings

pangPython/hdfs-tree

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDFS-Tree

hdfs-tree is a utility tool to display hdfs directory in tree structure, meanwhile you can sort directory by size/file count/mtime. This is useful when you want to check the disk usage of hdfs, just like what we do in bash with du -h --max-depth=1 / | sort -hr | head to check the linux file system disk usage.

Build

  1. git clone https://github.com/jacoffee/hdfs-tree.git

  2. cd ${Project_HOME} && mvn clean -DskipTests package

  3. cd ${Project_HOME}/target && java -jar hdfs-tree_2.10-1.0.jar xxx

Documentation

Possible parameters you can pass with hdfs-tree are as follows:

Usage: hdfs-tree [OPTION]
Options:

  -d, --depth  <arg>   max traverse depth for directory, default 5
  -l, --limit  <arg>   top N to display by sort, default 5
  -r, --reverse        reverse the sort or not
  -s, --sort  <arg>    sort options, must be size | mtime | count(file count)
  -t, --type  <arg>    file type to compare, d | f
      --help           Show help message

However, if you want to traverse the root directory, you should use hdfs://localhost/ instead of hdfs://localhost。 As for the reason, please check org.apache.hadoop.hdfs.DistributedFileSystem.getPathName and everything will be clear.

In hadoop cluster that supports namenode ha, you should pass the hadoop configuration directory in the class path, i.e

java -cp ${HADOOP_CONF_DIR}:hdfs-tree_2.10-1.0.jar com.jacoffee.HDFSTree $@

Also, you can configure two environment variables: ${HDFS_CONF_DIR}、${HDFS_TREE_HOME} and use the script hdfs-tree in ${HDFS_TREE_HOME}/bin directly.

1. display directory sort by size desc

java -jar hdfs-tree_2.10-1.0.jar -r -l 3 -d 2 -s size hdfs://localhost/
hdfs://localhost/

├── [       2GB] user
	├── [       2GB] allen
	├── [     526MB] hive
	├── [       1KB] platform
├── [     709MB] tmp
	├── [     709MB] hadoop-yarn
	├── [        0B] hive
├── [      99MB] spark
	├── [      99MB] eventLog

2. display directory and sort by filecount desc

java -jar hdfs-tree_2.10-1.0.jar -r -l 3 -d 2 -s count hdfs://localhost/
hdfs://localhost/
├── [       692] spark
	├── [       692] eventLog
├── [       392] tmp
	├── [       392] hadoop-yarn
	├── [         0] hive
├── [       301] user
	├── [       201] hive
	├── [        99] allen
	├── [         1] platform

3. display directory and sort by mtime desc

java -jar hdfs-tree_2.10-1.0.jar -r -l 3 -d 2 -s mtime hdfs://localhost/
hdfs://localhost/
├── [2018-02-05 09:44:07] user
	├── [2018-04-02 22:50:19] allen
	├── [2018-02-05 09:44:07] platform
	├── [2017-08-06 11:18:50] hive

About

Display hdfs directory/file in tree structure, thoughts inspired by Linux tree command

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 98.6%
  • Shell 1.4%
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy