Distributed Filesystems Review
Distributed Filesystems Review
File Systems
Google File System (GFS) Kosmos File System (KFS) Hadoop Distributed File System (HDFS) GlusterFS Red Hat Global File System Luster Summary
Slide 2
Slide 3
Slide 4
Limitations
No standard API such as POSIX. Not integrated File System operations. Some performance issues depend on applications and clients implementation. GFS does not guarantee that all replicas are byte-wise identical. It only guarantees that the data is written at least once as an atomic unit. Append operation atomically at least once issue. (GFS may insert padding or record duplicates in between.) Application/Client have opportunity to get a stale chunk replica. (Reader deal with it) If a write by the application is large or straddles a chunk boundary, it may be added fragments from different clients. Need tight cooperate of applications. Not support hard links or soft links.
Slide 7
Slide 8
Location Signaling
Organization Signaling
Organization Signaling
Block Team Talk KFS Block server Block Data Stream KFS Block server
Linux FS
Slide 10
libfuse (FUSE user programming library) FS OP OP Result OP Result glibc FS OP KFS Block Server
glibc
Slide 12
Communication Protocols RPCs Staging, client data buffing (like POSIX implementation)
Slide 13
Slide 14
GlusterFS
Gluster for specific tasks such as HPC Clustering, Storage Clustering, Enterprise Provisioning, Database Clustering etc. GlusterFS GlusterHPC
Slide 16
GlusterFS
Slide 17
GlusterFS
Slide 18
GlusterFS
Clients
GlusterFS Client
POSIX
FUSE libfuse
VFS
FUSE fuse.ko
Slide 19
GlusterFS
Architecture
Different from GoogleFS series. No meta-data no master server. User space logical volume management scenario. Server node machines export disk storages as bricks. The brick nodes store distributed files in underling Linux file system. The file namespaces are also stored at storage bricks, just as the file data bricks. Except the size of the files is zero. Bricks (file data or namespaces) support replication. NFS like Disk Layout
Interconnect
Infiniband RDMA (High throughput) TCP/IP
Features
Support FUSE, complete POSIX interface. AFR (mirror) Self Heal Stripe (note: not good implemented)
Slide 20
GlusterFS
Valued Stuff
Easy to setup for a moderate cluster. FUSE and POSIX Scheduler Modules for balancing Performance tuning flexibly Design:
Stackable Modules,Translators, run-time .so implementation. Not tied to I/O Profiles or Hardware or OS
Well-tested and with different representative benchmarks. Performance and simplicity is better then Luster.
Limitations
Lacks global management function, no master. The AFR function depends on configuration, lacks automation and flexibility. Now, cannot automatic add new bricks. If a master component is added, it will be a better Cluster FS.
Slide 21
Slide 22
Slide 23
Slide 25
Luster
Sun Microsystems Target 10,000 of nodes, PB of storage, 100GB/sec throughput. Lustre is kernel software, which interacts with storage devices. Your Lustre deployment must be correctly installed, configured, and administered to reduce the risk of security issues or data loss. It uses Object-Based Storage Devices (OSDs), to manage entire file objects (inodes) instead of blocks. Components
Meta Data Servers (MDSs) Object Storage Targets (OSTs) Lustre clients.
Luster is a little too complex to be used. But it seems a verified and reliable File System.
Slide 26
Slide 27
Summary
Slide 28
Summary Cluster Volume Managers SAN File Systems Cluster File Systems Parallel NFS (pNFS) Object-based Storage Devices (OSD) Global/Parallel File System Distribute/Cluster/Parallel Level
Volume level (block based) File or File system level (file, block or object(for OSD) based) Database or application level
Summary Traditional/Historical
Block level: Volume Management
EMC PowerPath (PPVM) HP Shared LVM IBM LVM MACROIMPACT SAN CVM REDHAT LVM SANBOLIC LaScala VERITAS