Red Hat Gluster Storage 3.3 Administration Guide en US
Red Hat Gluster Storage 3.3 Administration Guide en US
Administration Guide
Divya Muntimadugu
Red Hat Customer Content Services
divya@redhat.com
Bhavana Mohanraj
Red Hat Customer Content Services
bmohanra@redhat.com
Laura Bailey
Red Hat Customer Content Services
lbailey@redhat.com
This document is licensed by Red Hat under the Creative Commons Attribution-ShareAlike 3.0
Unported License. If you distribute this document, or a modified version of it, you must provide
attribution to Red Hat, Inc. and provide a link to the original. If the document is modified, all Red
Hat trademarks must be removed.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert,
Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity
logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other
countries.
Linux ® is the registered trademark of Linus Torvalds in the United States and other countries.
XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United
States and/or other countries.
MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and
other countries.
Node.js ® is an official trademark of Joyent. Red Hat Software Collections is not formally related
to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack ® Word Mark and OpenStack logo are either registered trademarks/service marks
or trademarks/service marks of the OpenStack Foundation, in the United States and other
countries and are used with the OpenStack Foundation's permission. We are not affiliated with,
endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
Abstract
Red Hat Gluster Storage Administration Guide describes the configuration and management of
Red Hat Gluster Storage for On-Premise.
Table of Contents
Table of Contents
. . . . . . .I.. .PREFACE
PART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6. . . . . . . . . . . .
.CHAPTER
. . . . . . . . . . 1.
. .PREFACE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7. . . . . . . . . . . .
1.1. ABOUT RED HAT GLUSTER STORAGE 7
1.2. ABOUT GLUSTERFS 7
1.3. ABOUT ON-PREMISES INSTALLATION 7
. . . . . . .II.
PART . . OVERVIEW
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8. . . . . . . . . . . .
.CHAPTER
. . . . . . . . . . 2.
. . ARCHITECTURE
. . . . . . . . . . . . . . . . . .AND
. . . . .CONCEPTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9. . . . . . . . . . . .
2.1. ARCHITECTURE 9
2.2. ON-PREMISES ARCHITECTURE 9
2.3. STORAGE CONCEPTS 10
. . . . . . .III.
PART . . .CONFIGURE
. . . . . . . . . . . . .AND
. . . . .VERIFY
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
.............
.CHAPTER
. . . . . . . . . . 3.
. . CONSIDERATIONS
. . . . . . . . . . . . . . . . . . . . FOR
. . . . .RED
. . . . .HAT
. . . . .GLUSTER
. . . . . . . . . . STORAGE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
.............
3.1. VERIFYING PORT ACCESS 16
3.2. FEATURE COMPATABILITY SUPPORT 18
. . . . . . . . . . . 4.
CHAPTER . . .ADDING
. . . . . . . . SERVERS
. . . . . . . . . . .TO
. . . THE
. . . . . TRUSTED
. . . . . . . . . . .STORAGE
. . . . . . . . . . POOL
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
.............
4.1. ADDING SERVERS TO THE TRUSTED STORAGE POOL 21
4.2. REMOVING SERVERS FROM THE TRUSTED STORAGE POOL 23
. . . . . . . . . . . 5.
CHAPTER . . SETTING
. . . . . . . . . .UP
. . . .STORAGE
. . . . . . . . . . VOLUMES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
..............
5.1. SETTING UP GLUSTER STORAGE VOLUMES USING GDEPLOY 25
5.2. MANAGING VOLUMES USING HEKETI 64
5.3. ABOUT ENCRYPTED DISK 74
5.4. FORMATTING AND MOUNTING BRICKS 74
5.5. CREATING DISTRIBUTED VOLUMES 78
5.6. CREATING REPLICATED VOLUMES 80
5.7. CREATING DISTRIBUTED REPLICATED VOLUMES 87
5.8. CREATING ARBITRATED REPLICATED VOLUMES 91
5.9. CREATING DISPERSED VOLUMES 96
5.10. CREATING DISTRIBUTED DISPERSED VOLUMES 99
5.11. STARTING VOLUMES 101
.CHAPTER
. . . . . . . . . . 6.
. . CREATING
. . . . . . . . . . . .ACCESS
. . . . . . . . .TO
. . . VOLUMES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
...............
6.1. NATIVE CLIENT 103
6.2. NFS 112
6.3. SMB 155
6.4. POSIX ACCESS CONTROL LISTS 172
6.5. MANAGING OBJECT STORE 176
6.6. CHECKING CLIENT OPERATING VERSIONS 196
.CHAPTER
. . . . . . . . . . 7.
. . INTEGRATING
. . . . . . . . . . . . . . . RED
. . . . . HAT
. . . . . GLUSTER
. . . . . . . . . . .STORAGE
. . . . . . . . . . WITH
. . . . . . WINDOWS
. . . . . . . . . . . ACTIVE
. . . . . . . . .DIRECTORY
. . . . . . . . . . . . . . . . . . . . . .197
..............
7.1. PREREQUISITES 198
7.2. INTEGRATION 199
. . . . . . .IV.
PART . . .MANAGE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
...............
.CHAPTER
. . . . . . . . . . 8.
. . .MANAGING
. . . . . . . . . . . .SNAPSHOTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .208
...............
8.1. PREREQUISITES 209
8.2. CREATING SNAPSHOTS 211
8.3. CLONING A SNAPSHOT 213
1
Administration Guide
. . . . . . . . . . . 9.
CHAPTER . . MANAGING
. . . . . . . . . . . . .DIRECTORY
. . . . . . . . . . . . .QUOTAS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
...............
9.1. ENABLING AND DISABLING QUOTAS 233
9.2. BEFORE SETTING A QUOTA ON A DIRECTORY 233
9.3. LIMITING DISK USAGE 234
.CHAPTER
. . . . . . . . . . 10.
. . . MANAGING
. . . . . . . . . . . . .GEO-REPLICATION
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .238
...............
10.1. ABOUT GEO-REPLICATION 238
10.2. REPLICATED VOLUMES VS GEO-REPLICATION 238
10.3. PREPARING TO DEPLOY GEO-REPLICATION 238
10.4. STARTING GEO-REPLICATION 246
10.5. STARTING GEO-REPLICATION ON A NEWLY ADDED BRICK OR NODE 255
10.6. SCHEDULING GEO-REPLICATION AS A CRON JOB 257
10.7. DISASTER RECOVERY 258
10.8. CREATING A SNAPSHOT OF GEO-REPLICATED VOLUME 261
10.9. EXAMPLE - SETTING UP CASCADING GEO-REPLICATION 261
10.10. RECOMMENDED PRACTICES 263
10.11. TROUBLESHOOTING GEO-REPLICATION 265
.CHAPTER
. . . . . . . . . . 11.
. . .MANAGING
. . . . . . . . . . . . RED
. . . . .HAT
. . . . .GLUSTER
. . . . . . . . . . STORAGE
. . . . . . . . . . .VOLUMES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .268
...............
11.1. CONFIGURING VOLUME OPTIONS 268
11.2. SETTING MULTIPLE VOLUME OPTION 268
11.3. SUPPORTED VOLUME OPTIONS 269
11.4. CONFIGURING TRANSPORT TYPES FOR A VOLUME 292
11.5. EXPANDING VOLUMES 293
11.6. SHRINKING VOLUMES 297
11.7. MIGRATING VOLUMES 303
11.8. REPLACING HOSTS 311
11.9. REBALANCING VOLUMES 319
11.10. SETTING UP SHARED STORAGE VOLUME 322
11.11. STOPPING VOLUMES 323
11.12. DELETING VOLUMES 324
11.13. MANAGING SPLIT-BRAIN 324
11.14. RECOMMENDED CONFIGURATIONS - DISPERSED VOLUME 341
. . . . . . . . . . . 12.
CHAPTER . . . MANAGING
. . . . . . . . . . . . .RED
. . . . HAT
. . . . . GLUSTER
. . . . . . . . . . .STORAGE
. . . . . . . . . . LOGS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351
..............
12.1. LOG ROTATION 351
12.2. RED HAT GLUSTER STORAGE COMPONENT LOGS AND LOCATION 351
12.3. CONFIGURING THE LOG FORMAT 353
12.4. CONFIGURING THE LOG LEVEL 354
12.5. SUPPRESSING REPETITIVE LOG MESSAGES 356
12.6. GEO-REPLICATION LOGS 358
2
Table of Contents
.CHAPTER
. . . . . . . . . . 13.
. . . MANAGING
. . . . . . . . . . . . .RED
. . . . HAT
. . . . . GLUSTER
. . . . . . . . . . .STORAGE
. . . . . . . . . . VOLUME
. . . . . . . . . .LIFE-CYCLE
. . . . . . . . . . . . .EXTENSIONS
. . . . . . . . . . . . . . . . . . . . . . . . . .360
...............
13.1. LOCATION OF SCRIPTS 360
13.2. PREPACKAGED SCRIPTS 361
.CHAPTER
. . . . . . . . . . 14.
. . . MANAGING
. . . . . . . . . . . . .CONTAINERIZED
. . . . . . . . . . . . . . . . . .RED
. . . . .HAT
. . . . .GLUSTER
. . . . . . . . . .STORAGE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
...............
14.1. PREREQUISITES 362
14.2. STARTING A CONTAINER 362
14.3. CREATING A TRUSTED STORAGE POOL 364
14.4. CREATING A VOLUME 364
14.5. MOUNTING A VOLUME 365
. . . . . . . . . . . 15.
CHAPTER . . . DETECTING
. . . . . . . . . . . . .BITROT
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
...............
15.1. ENABLING AND DISABLING THE BITROT DAEMON 366
15.2. MODIFYING BITROT DETECTION BEHAVIOR 366
15.3. RESTORING A BAD FILE 367
.CHAPTER
. . . . . . . . . . 16.
. . . INCREMENTAL
. . . . . . . . . . . . . . . . BACKUP
. . . . . . . . . .ASSISTANCE
. . . . . . . . . . . . . .USING
. . . . . . .GLUSTERFIND
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
...............
16.1. GLUSTERFIND CONFIGURATION OPTIONS 370
. . . . . . . . . . . 17.
CHAPTER . . . MANAGING
. . . . . . . . . . . . TIERING
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
...............
17.1. TIERING ARCHITECTURE 376
17.2. KEY BENEFITS OF TIERING 376
17.3. TIERING LIMITATIONS 376
17.4. ATTACHING A TIER TO A VOLUME 377
17.5. CONFIGURING A TIERING VOLUME 379
17.6. DISPLAYING TIERING STATUS INFORMATION 381
17.7. DETACHING A TIER FROM A VOLUME 382
. . . . . . .V.
PART . . MONITOR
. . . . . . . . . . .AND
. . . . .TUNE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
...............
.CHAPTER
. . . . . . . . . . 18.
. . . MONITORING
. . . . . . . . . . . . . . .RED
. . . . .HAT
. . . . .GLUSTER
. . . . . . . . . .STORAGE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .386
...............
18.1. PREREQUISITES 387
18.2. INSTALLING NAGIOS 388
18.3. MONITORING RED HAT GLUSTER STORAGE TRUSTED STORAGE POOL 390
18.4. MONITORING NOTIFICATIONS 412
18.5. NAGIOS ADVANCED CONFIGURATION 416
18.6. CONFIGURING NAGIOS MANUALLY 419
18.7. TROUBLESHOOTING NAGIOS 424
. . . . . . . . . . . 19.
CHAPTER . . . MONITORING
. . . . . . . . . . . . . . .RED
. . . . .HAT
. . . . .GLUSTER
. . . . . . . . . .STORAGE
. . . . . . . . . . .GLUSTER
. . . . . . . . . .WORKLOAD
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
...............
19.1. RUNNING THE VOLUME PROFILE COMMAND 431
19.2. RUNNING THE VOLUME TOP COMMAND 434
19.3. GSTATUS COMMAND 440
19.4. LISTING VOLUMES 445
19.5. DISPLAYING VOLUME INFORMATION 445
19.6. OBTAINING NODE INFORMATION 446
19.7. RETRIEVING CURRENT VOLUME OPTION SETTINGS 462
19.8. VIEWING COMPLETE VOLUME STATE WITH STATEDUMP 464
19.9. DISPLAYING VOLUME STATUS 466
19.10. TROUBLESHOOTING ISSUES IN THE RED HAT GLUSTER STORAGE TRUSTED STORAGE POOL 471
. . . . . . . . . . . 20.
CHAPTER . . . .MANAGING
. . . . . . . . . . . . RESOURCE
. . . . . . . . . . . . USAGE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
...............
. . . . . . . . . . . 21.
CHAPTER . . . TUNING
. . . . . . . . .FOR
. . . . .PERFORMANCE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .474
...............
21.1. DISK CONFIGURATION 474
21.2. BRICK CONFIGURATION 474
3
Administration Guide
. . . . . . . . . . . 22.
CHAPTER . . . .NAGIOS
. . . . . . . . .CONFIGURATION
. . . . . . . . . . . . . . . . . .FILES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .486
...............
. . . . . . .VI.
PART . . .SECURITY
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .488
...............
. . . . . . . . . . . 23.
CHAPTER . . . .CONFIGURING
. . . . . . . . . . . . . . .NETWORK
. . . . . . . . . . . ENCRYPTION
. . . . . . . . . . . . . . .IN
. . .RED
. . . . HAT
. . . . . GLUSTER
. . . . . . . . . . .STORAGE
. . . . . . . . . . . . . . . . . . . . . . . . . . .489
...............
23.1. PREREQUISITES 489
23.2. CONFIGURING NETWORK ENCRYPTION FOR A NEW TRUSTED STORAGE POOL 490
23.3. CONFIGURING NETWORK ENCRYPTION FOR AN EXISTING TRUSTED STORAGE POOL 492
23.4. EXPANDING VOLUMES 494
23.5. AUTHORIZING A NEW CLIENT 496
. . . . . . .VII.
PART . . . .TROUBLESHOOT
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .498
...............
. . . . . . . . . . . 24.
CHAPTER . . . .RESOLVING
. . . . . . . . . . . . .COMMON
. . . . . . . . . .ISSUES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .499
...............
24.1. IDENTIFYING LOCKED FILE AND CLEAR LOCKS 499
24.2. RETRIEVING FILE PATH FROM THE GLUSTER VOLUME 502
. . . . . . .VIII.
PART . . . . APPENDICES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .505
...............
. . . . . . . . . . . 25.
CHAPTER . . . .STARTING
. . . . . . . . . . . AND
. . . . . STOPPING
. . . . . . . . . . . .THE
. . . . GLUSTERD
. . . . . . . . . . . . SERVICE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .506
...............
. . . . . . . . . . . 26.
CHAPTER . . . .MANUALLY
. . . . . . . . . . . . RECOVERING
. . . . . . . . . . . . . . .FILE
. . . . .SPLIT-BRAIN
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
...............
. . . . . . . . . . . . A.
APPENDIX . . .REVISION
. . . . . . . . . . HISTORY
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .513
..............
4
Table of Contents
5
Administration Guide
PART I. PREFACE
6
CHAPTER 1. PREFACE
CHAPTER 1. PREFACE
Red Hat Gluster Storage provides new opportunities to unify data storage and infrastructure, increase
performance, and improve availability and manageability in order to meet a broader set of an
organization’s storage challenges and needs.
The POSIX compatible glusterFS servers, which use XFS file system format to store data on disks, can
be accessed using industry-standard access protocols including Network File System (NFS) and Server
Message Block (SMB) (also known as CIFS).
Red Hat Gluster Storage can be installed on commodity servers resulting in a powerful, massively
scalable, and highly available NAS environment.
7
Administration Guide
8
CHAPTER 2. ARCHITECTURE AND CONCEPTS
2.1. ARCHITECTURE
At the core of the Red Hat Gluster Storage design is a completely new method of architecting storage.
The result is a system that has immense scalability, is highly resilient, and offers extraordinary
performance.
In a scale-out system, one of the biggest challenges is keeping track of the logical and physical
locations of data and metadata. Most distributed systems solve this problem by creating a metadata
server to track the location of data and metadata. As traditional systems add more files, more servers,
or more disks, the central metadata server becomes a performance bottleneck, as well as a central
point of failure.
Unlike other traditional storage solutions, Red Hat Gluster Storage does not need a metadata server,
and locates files algorithmically using an elastic hashing algorithm. This no-metadata server
architecture ensures better performance, linear scalability, and reliability.
9
Administration Guide
Red Hat Gluster Storage for On-premises enables enterprises to treat physical storage as a virtualized,
scalable, and centrally managed storage pool by using commodity storage hardware.
It supports multi-tenancy by partitioning users or groups into logical volumes on shared storage. It
enables users to eliminate, decrease, or manage their dependence on high-cost, monolithic and
difficult-to-deploy storage arrays.
You can add capacity in a matter of minutes across a wide variety of workloads without affecting
performance. Storage can also be centrally managed across a variety of workloads, thus increasing
storage efficiency.
Red Hat Gluster Storage for On-premises is based on glusterFS, an open source distributed file system
with a modular, stackable design, and a unique no-metadata server architecture. This no-metadata
server architecture ensures better performance, linear scalability, and reliability.
Brick
The glusterFS basic unit of storage, represented by an export directory on a server in the trusted
storage pool. A brick is expressed by combining a server with an export directory in the following
format:
SERVER:EXPORT
For example:
10
CHAPTER 2. ARCHITECTURE AND CONCEPTS
myhostname:/exports/myexportdir/
Volume
A volume is a logical collection of bricks. Most of the Red Hat Gluster Storage management
operations happen on the volume.
Translator
A translator connects to one or more subvolumes, does something with them, and offers a
subvolume connection.
Subvolume
A brick after being processed by at least one translator.
Volfile
Volume (vol) files are configuration files that determine the behavior of your Red Hat Gluster
Storage trusted storage pool. At a high level, GlusterFS has three entities, that is, Server, Client and
Management daemon. Each of these entities have their own volume files. Volume files for servers
and clients are generated by the management daemon upon creation of a volume.
Server and Client Vol files are located in /var/lib/glusterd/vols/VOLNAME directory. The
management daemon vol file is named as glusterd.vol and is located in /etc/glusterfs/
directory.
WARNING
You must not modify any vol file in /var/lib/glusterd manually as Red Hat
does not support vol files that are not generated by the management daemon.
glusterd
glusterd is the glusterFS Management Service that must run on all servers in the trusted storage
pool.
Cluster
A trusted pool of linked computers working together, resembling a single computing resource. In
Red Hat Gluster Storage, a cluster is also referred to as a trusted storage pool.
Client
The machine that mounts a volume (this may also be a server).
File System
A method of storing and organizing computer files. A file system organizes files into a database for
the storage, manipulation, and retrieval by the computer's operating system.
Source: Wikipedia
11
Administration Guide
POSIX
Portable Operating System Interface (for Unix) (POSIX) is the name of a family of related standards
specified by the IEEE to define the application programming interface (API), as well as shell and
utilities interfaces, for software that is compatible with variants of the UNIX operating system. Red
Hat Gluster Storage exports a fully POSIX compatible file system.
Metadata
Metadata is data providing information about other pieces of data.
FUSE
Filesystem in User space (FUSE) is a loadable kernel module for Unix-like operating systems that
lets non-privileged users create their own file systems without editing kernel code. This is achieved
by running file system code in user space while the FUSE module provides only a "bridge" to the
kernel interfaces.
Source: Wikipedia
Geo-Replication
Geo-replication provides a continuous, asynchronous, and incremental replication service from one
site to another over Local Area Networks (LAN), Wide Area Networks ( WAN), and the Internet.
N-way Replication
Local synchronous data replication that is typically deployed across campus or Amazon Web
Services Availability Zones.
Petabyte
A petabyte is a unit of information equal to one quadrillion bytes, or 1000 terabytes. The unit
symbol for the petabyte is PB. The prefix peta- (P) indicates a power of 1000:
The term "pebibyte" (PiB), using a binary prefix, is used for the corresponding power of 1024.
Source: Wikipedia
RAID
Redundant Array of Independent Disks (RAID) is a technology that provides increased storage
reliability through redundancy. It combines multiple low-cost, less-reliable disk drives components
into a logical unit where all drives in the array are interdependent.
RRDNS
12
CHAPTER 2. ARCHITECTURE AND CONCEPTS
Round Robin Domain Name Service (RRDNS) is a method to distribute load across application
servers. RRDNS is implemented by creating multiple records with the same name and different IP
addresses in the zone file of a DNS server.
Server
The machine (virtual or bare metal) that hosts the file system in which data is stored.
Block Storage
Block special files, or block devices, correspond to devices through which the system moves data in
the form of blocks. These device nodes often represent addressable devices such as hard disks, CD-
ROM drives, or memory regions. As of Red Hat Gluster Storage 3.3, block storage supports only
Container-Native Storage (CNS) and Container-Ready Storage (CRS) use cases. Block storage can
be created and configured for this use case by using the gluster-block command line tool. For
more information, see Container-Native Storage for OpenShift Container Platform.
Scale-Up Storage
Increases the capacity of the storage device in a single dimension. For example, adding additional
disk capacity in a trusted storage pool.
Scale-Out Storage
Increases the capability of a storage device in single dimension. For example, adding more systems
of the same size, or adding servers to a trusted storage pool that increases CPU, disk capacity, and
throughput for the trusted storage pool.
Namespace
An abstract container or environment that is created to hold a logical grouping of unique identifiers
or symbols. Each Red Hat Gluster Storage trusted storage pool exposes a single namespace as a
POSIX mount point which contains every file in the trusted storage pool.
User Space
Applications running in user space do not directly interact with hardware, instead using the kernel
to moderate access. User space applications are generally more portable than applications in
kernel space. glusterFS is a user space application.
Hashed subvolume
A Distributed Hash Table Translator subvolume to which the file or directory name is hashed to.
Cached subvolume
A Distributed Hash Table Translator subvolume where the file content is actually present. For
directories, the concept of cached-subvolume is not relevant. It is loosely used to mean subvolumes
which are not hashed-subvolume.
Linkto-file
13
Administration Guide
For a newly created file, the hashed and cached subvolumes are the same. When directory entry
operations like rename (which can change the name and hence hashed subvolume of the file) are
performed on the file, instead of moving the entire data in the file to a new hashed subvolume, a file
is created with the same name on the newly hashed subvolume. The purpose of this file is only to
act as a pointer to the node where the data is present. In the extended attributes of this file, the
name of the cached subvolume is stored. This file on the newly hashed-subvolume is called a linkto-
file. The linkto file is relevant only for non-directory entities.
Directory Layout
The directory layout specifies the hash-ranges of the subdirectories of a directory to which
subvolumes they correspond to.
The layouts are created at the time of directory creation and are persisted as extended
attributes of the directory.
A subvolume is not included in the layout if it remained offline at the time of directory
creation and no directory entries ( such as files and directories) of that directory are
created on that subvolume. The subvolume is not part of the layout until the fix-layout is
complete as part of running the rebalance command. If a subvolume is down during access
(after directory creation), access to any files that hash to that subvolume fails.
Fix Layout
A command that is executed during the rebalance process.
1. Fixes the layouts of directories to accommodate any subvolumes that are added or
removed. It also heals the directories, checks whether the layout is non-contiguous, and
persists the layout in extended attributes, if needed. It also ensures that the directories
have the same attributes across all the subvolumes.
14
PART III. CONFIGURE AND VERIFY
15
Administration Guide
The Red Hat Gluster Storage glusterFS daemon glusterd enables dynamic configuration changes to
Red Hat Gluster Storage volumes, without needing to restart servers or remount storage volumes on
clients.
Red Hat Gluster Storage Server uses the listed ports. You must ensure that the firewall settings do not
prevent access to these ports.
Firewall configuration tools differ between Red Hat Entperise Linux 6 and Red Hat Enterprise Linux 7.
For Red Hat Enterprise Linux 6, use the iptables command to open a port:
For Red Hat Enterprise Linux 7, if default ports are in use, it is usually simpler to add a service rather
than open a port:
However, if the default ports are already in use, you can open a specific port with the following
command:
16
CHAPTER 3. CONSIDERATIONS FOR RED HAT GLUSTER STORAGE
Table 3.2. TCP Port Numbers used for Object Storage (Swift)
17
Administration Guide
For more information regarding port and firewall details for NFS-Ganesha, refer Section 6.2.3.2.1, “Port
and Firewall Information for NFS-Ganesha”
Features in the following table are supported from the specified version and later.
Feature Version
SELinux 3.1
18
CHAPTER 3. CONSIDERATIONS FOR RED HAT GLUSTER STORAGE
Feature Version
Sharding 3.1.3
Snapshots 3.0
Tiering 3.1.2
[a] See Section 17.3. Tiering Limitations in the Red Hat Gluster Storage 3.3 Administration Guide for details.
19
Administration Guide
20
CHAPTER 4. ADDING SERVERS TO THE TRUSTED STORAGE POOL
When the first server starts, the storage pool consists of that server alone. Adding additional storage
servers to the storage pool is achieved using the probe command from a running, trusted storage
server.
IMPORTANT
Before adding servers to the trusted storage pool, you must ensure that the ports
specified in Chapter 3, Considerations for Red Hat Gluster Storageare open.
On Red Hat Enterprise Linux 7, enable the glusterFS firewall service in the active zones
for runtime and permanent mode using the following commands:
# firewall-cmd --get-active-zones
To allow the firewall service in the active zones, run the following commands:
For more information about using firewalls, see section Using Firewalls in the Red Hat
Enterprise Linux 7 Security Guide: https://access.redhat.com/documentation/en-
US/Red_Hat_Enterprise_Linux/7/html/Security_Guide/sec-Using_Firewalls.html.
NOTE
When any two gluster commands are executed concurrently on the same volume, the
following error is displayed:
This behavior in the Red Hat Gluster Storage prevents two or more commands from
simultaneously modifying a volume configuration, potentially resulting in an
inconsistent state. Such an implementation is common in environments with monitoring
frameworks such as the Red Hat Gluster Storage Console, Red Hat Enterprise
Virtualization Manager, and Nagios. For example, in a four node Red Hat Gluster Storage
Trusted Storage Pool, this message is observed when gluster volume status
VOLNAME command is executed from two of the nodes simultaneously.
21
Administration Guide
NOTE
Probing a node from lower version to a higher version of Red Hat Gluster Storage node is
not supported.
Create a trusted storage pool consisting of three storage servers, which comprise a volume.
Prerequisites
The glusterd service must be running on all storage servers requiring addition to the trusted
storage pool. See Chapter 25, Starting and Stopping the glusterd servicefor service start and
stop commands.
1. Run gluster peer probe [server] from Server 1 to add additional servers to the trusted
storage pool.
NOTE
All the servers in the Trusted Storage Pool must have RDMA devices if
either RDMA or RDMA,TCP volumes are created in the storage pool. The peer
probe must be performed using IP/hostname assigned to the RDMA device.
2. Verify the peer status from all servers using the following command:
Hostname: server2
Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5
State: Peer in Cluster (Connected)
Hostname: server3
Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
State: Peer in Cluster (Connected)
22
CHAPTER 4. ADDING SERVERS TO THE TRUSTED STORAGE POOL
Hostname: server4
Uuid: 3e0caba-9df7-4f66-8e5d-cbc348f29ff7
State: Peer in Cluster (Connected)
IMPORTANT
If the existing trusted storage pool has a geo-replication session, then after adding the
new server to the trusted storage pool, perform the steps listed at Section 10.5,
“Starting Geo-replication on a Newly Added Brick or Node”.
Remove one server from the Trusted Storage Pool, and check the peer status of the storage pool.
Prerequisites
The glusterd service must be running on the server targeted for removal from the storage
pool. See Chapter 25, Starting and Stopping the glusterd servicefor service start and stop
commands.
1. Run gluster peer detach [server] to remove the server from the trusted storage pool.
2. Verify the peer status from all servers using the following command:
Hostname: server2
Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5
State: Peer in Cluster (Connected)
Hostname: server3
Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
23
Administration Guide
WARNING
Red Hat does not support writing data directly into the bricks. Read and write data
only through the Native Client, or through NFS or SMB mounts.
NOTE
Red Hat Gluster Storage supports IP over Infiniband (IPoIB). Install Infiniband packages
on all Red Hat Gluster Storage servers and clients to support this feature. Run the yum
groupinstall "Infiniband Support" to install Infiniband packages.
Volume Types
Distributed
Distributes files across bricks in the volume.
Use this volume type where scaling and redundancy requirements are not important, or provided by
other hardware or software layers.
See Section 5.5, “Creating Distributed Volumes” for additional information about this volume type.
Replicated
Replicates files across bricks in the volume.
Use this volume type in environments where high-availability and high-reliability are critical.
See Section 5.6, “Creating Replicated Volumes” for additional information about this volume type.
Distributed Replicated
Distributes files across replicated bricks in the volume.
Use this volume type in environments where high-reliability and scalability are critical. This volume
type offers improved read performance in most environments.
See Section 5.7, “Creating Distributed Replicated Volumes” for additional information about this
volume type.
Arbitrated Replicated
Replicates files across bricks in the volume, except for every third brick, which stores only
metadata.
24
CHAPTER 5. SETTING UP STORAGE VOLUMES
Use this volume type in environments where consistency is critical, but underlying storage space is
at a premium.
See Section 5.8, “Creating Arbitrated Replicated Volumes” for additional information about this
volume type.
Dispersed
Disperses the file's data across the bricks in the volume.
Use this volume type where you need a configurable level of reliability with a minimum space waste.
See Section 5.9, “Creating Dispersed Volumes” for additional information about this volume type.
Distributed Dispersed
Distributes file's data across the dispersed sub-volume.
Use this volume type where you need a configurable level of reliability with a minimum space waste.
See Section 5.10, “Creating Distributed Dispersed Volumes” for additional information about this
volume type.
When setting-up a new trusted storage pool, gdeploy could be the preferred choice of trusted storage
pool set up, as manually executing numerous commands can be error prone.
Setting-up the backend on several machines can be done from one's laptop/desktop. This
saves time and scales up well when the number of nodes in the trusted storage pool increase.
Flexibility in naming the logical volumes (LV) and volume groups (VG).
Prerequisites
1. Generate the passphrase-less SSH keys for the nodes which are going to be part of the trusted
storage pool by running the following command:
2. Set up password-less SSH access between the gdeploy controller and servers by running the
following command:
# ssh-copy-id -i root@server
25
Administration Guide
NOTE
If you are using a Red Hat Gluster Storage node as the deployment node and not
an external node, then the password-less SSH must be set up for the Red Hat
Gluster Storage node from where the installation is performed using the
following command:
# ssh-copy-id -i root@localhost
For Red Hat Gluster Storage 3.3 on Red Hat Enterprise Linux 7.4, execute the following
command:
For multiple devices, use multiple volume groups, thinpool and thinvol in the gdeploy
configuration file
gdeploy can be used to deploy Red Hat Gluster Storage in two ways:
For more information on installing gdeploy see, Installing Ansible to Support Gdeploy section in the Red
Hat Gluster Storage 3.3 Installation Guide.
/usr/share/doc/gdeploy/examples/gluster.conf.sample
26
CHAPTER 5. SETTING UP STORAGE VOLUMES
NOTE
The trusted storage pool can be created either by performing each tasks, such as,
setting up a backend, creating a volume, and mounting volumes independently or
summed up as a single configuration.
For example, for a basic trusted storage pool of a 2 x 2 replicated volume the configuration details in
the configuration file will be as follows:
2x2-volume-create.conf:
#
# Usage:
# gdeploy -c 2x2-volume-create.conf
#
# This does backend setup first and then create the volume using the
# setup bricks.
#
#
[hosts]
10.70.46.13
10.70.46.17
[volume]
action=create
volname=sample_volname
replica=yes
replica_count=2
force=yes
[clients]
action=mount
volname=sample_volname
hosts=10.70.46.15
27
Administration Guide
fstype=glusterfs
client_mount_points=/mnt/gluster
With this configuration a 2 x 2 replica trusted storage pool with the given IP addresses and backend
device as /dev/sdb,/dev/sdc with the volume name as sample_volname will be created.
For more information on possible values, see Section 5.1.7, “Configuration File”
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
NOTE
You can create a new configuration file by referencing the template file available at
/usr/share/doc/gdeploy/examples/gluster.conf.sample. To invoke the new
configuration file, run gdeploy -c /path_to_file/config.txt command.
To only setup the backend see, Section 5.1.3, “Setting up the Backend ”
/usr/share/doc/gdeploy/examples/gluster.conf.sample
Creating Physical Volume (PV), Volume Group (VG), and Logical Volume (LV) individually
NOTE
For Red Hat Enterprise Linux 6, the xfsprogs package must be installed before setting
up the backend bricks using gdeploy.
Backend setup can be done on specific machines or on all the machines. The backend-setup module
internally creates PV, VG, and LV and mounts the device. Thin-p logical volumes are created as per the
performance recommendations by Red Hat.
28
CHAPTER 5. SETTING UP STORAGE VOLUMES
Generic
Specific
Generic
If the disk names are uniform across the machines then backend setup can be written as below. The
backend is setup for all the hosts in the `hosts’ section.
For more information on possible values, see Section 5.1.7, “Configuration File”
#
# Usage:
# gdeploy -c backend-setup-generic.conf
#
# This configuration creates backend for GlusterFS clusters
#
[hosts]
10.70.46.130
10.70.46.32
10.70.46.110
10.70.46.77
# Backend setup for all the nodes in the `hosts' section. This will create
# PV, VG, and LV with gdeploy generated names.
[backend-setup]
devices=vdb
Specific
If the disks names vary across the machines in the cluster then backend setup can be written for
specific machines with specific disk names. gdeploy is quite flexible in allowing to do host specific
setup in a single configuration file.
For more information on possible values, see Section 5.1.7, “Configuration File”
#
# Usage:
# gdeploy -c backend-setup-hostwise.conf
#
# This configuration creates backend for GlusterFS clusters
#
[hosts]
10.70.46.130
10.70.46.32
10.70.46.110
10.70.46.77
# Backend setup for 10.70.46.77 with default gdeploy generated names for
# Volume Groups and Logical Volumes. Volume names will be GLUSTER_vg1,
# GLUSTER_vg2...
29
Administration Guide
[backend-setup:10.70.46.77]
devices=vda,vdb
# Backend setup for remaining 3 hosts in the `hosts' section with custom
names
# for Volumes Groups and Logical Volumes.
[backend-setup:10.70.46.{130,32,110}]
devices=vdb,vdc,vdd
vgs=vg1,vg2,vg3
pools=pool1,pool2,pool3
lvs=lv1,lv2,lv3
mountpoints=/rhgs/brick1,/rhgs/brick2,/rhgs/brick3
brick_dirs=/rhgs/brick1/b1,/rhgs/brick2/b2,/rhgs/brick3/b3
If the user needs more control over setting up the backend, then pv, vg, and lv can be created
individually. LV module provides flexibility to create more than one LV on a VG. For example, the
`backend-setup’ module setups up a thin-pool by default and applies default performance
recommendations. However, if the user has a different use case which demands more than one LV, and
a combination of thin and thick pools then `backend-setup’ is of no help. The user can use PV, VG, and
LV modules to achieve this.
For more information on possible values, see Section 5.1.7, “Configuration File”
The below example shows how to create four logical volumes on a single volume group. The examples
shows a mix of thin and thickpool LV creation.
[hosts]
10.70.46.130
10.70.46.32
[pv]
action=create
devices=vdb
[vg1]
action=create
vgname=RHS_vg1
pvname=vdb
[lv1]
action=create
vgname=RHS_vg1
lvname=engine_lv
lvtype=thick
size=10GB
mount=/rhgs/brick1
[lv2]
action=create
vgname=RHS_vg1
poolname=lvthinpool
lvtype=thinpool
poolmetadatasize=200MB
chunksize=1024k
30
CHAPTER 5. SETTING UP STORAGE VOLUMES
size=30GB
[lv3]
action=create
lvname=lv_vmaddldisks
poolname=lvthinpool
vgname=RHS_vg1
lvtype=thinlv
mount=/rhgs/brick2
virtualsize=9GB
[lv4]
action=create
lvname=lv_vmrootdisks
poolname=lvthinpool
vgname=RHS_vg1
size=19GB
lvtype=thinlv
mount=/rhgs/brick3
virtualsize=19GB
#
# Extends a given given VG. pvname and vgname is mandatory, in this
example the
# vg `RHS_vg1' is extended by adding pv, vdd. If the pv is not alreay
present, it
# is created by gdeploy.
#
[hosts]
10.70.46.130
10.70.46.32
[vg2]
action=extend
vgname=RHS_vg1
pvname=vdd
/usr/share/doc/gdeploy/examples/gluster.conf.sample
For example, for a basic trusted storage pool of a 2 x 2 replicate volume the configuration details in the
configuration file will be as follows:
[hosts]
10.0.0.1
10.0.0.2
10.0.0.3
31
Administration Guide
10.0.0.4
[volume]
action=create
volname=glustervol
transport=tcp,rdma
replica=yes
replica_count=2
force=yes
For more information on possible values, see Section 5.1.7, “Configuration File”
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
NOTE
Support of creating multiple volumes only from gdeploy 2.0, please check your gdeploy
version before trying this configuration.
While creating multiple volumes in a single configuration, the [volume] modules should be numbered.
For example, if there are two volumes they will be numbered [volume1], [volume2]
vol-create.conf
[hosts]
10.70.46.130
10.70.46.32
[backend-setup]
devices=vdb,vdc
mountpoints=/mnt/data1,/mnt/data2
[volume1]
action=create
volname=vol-one
transport=tcp
replica=yes
replica_count=2
brick_dirs=/mnt/data1/1
[volume2]
action=create
volname=vol-two
transport=tcp
replica=yes
replica_count=2
brick_dirs=/mnt/data2/2
With gdeploy 2.0, a volume can be created with multiple volume options set. Number of keys should
match number of values.
32
CHAPTER 5. SETTING UP STORAGE VOLUMES
[hosts]
10.70.46.130
10.70.46.32
[backend-setup]
devices=vdb,vdc
mountpoints=/mnt/data1,/mnt/data2
[volume1]
action=create
volname=vol-one
transport=tcp
replica=yes
replica_count=2
key=group,storage.owner-uid,storage.owner-
gid,features.shard,features.shard-block-size,performance.low-prio-
threads,cluster.data-self-heal-algorithm
value=virt,36,36,on,512MB,32,full
brick_dirs=/mnt/data1/1
[volume2]
action=create
volname=vol-two
transport=tcp
replica=yes
key=group,storage.owner-uid,storage.owner-
gid,features.shard,features.shard-block-size,performance.low-prio-
threads,cluster.data-self-heal-algorithm
value=virt,36,36,on,512MB,32,full
replica_count=2
brick_dirs=/mnt/data2/2
The above configuration will create two volumes with multiple volume options set.
/usr/share/doc/gdeploy/examples/gluster.conf.sample
Following is an example of the modifications to the configuration file in order to mount clients:
[clients]
action=mount
hosts=10.70.46.159
fstype=glusterfs
client_mount_points=/mnt/gluster
volname=10.0.0.1:glustervol
NOTE
33
Administration Guide
For more information on possible values, see Section 5.1.7, “Configuration File”
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Adding a Brick
Modify the [volume] section in the configuration file to add a brick. For example:
[volume]
action=add-brick
volname=10.0.0.1:glustervol
bricks=10.0.0.1:/rhgs/new_brick
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Removing a Brick
Modify the [volume] section in the configuration file to remove a brick. For example:
[volume]
action=remove-brick
volname=10.0.0.1:glustervol
bricks=10.0.0.2:/rhgs/brick
state=commit
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
For more information on possible values, see Section 5.1.7, “Configuration File”
Modify the [volume] section in the configuration file to rebalance a volume. For example:
[volume]
action=rebalance
volname=10.70.46.13:glustervol
34
CHAPTER 5. SETTING UP STORAGE VOLUMES
state=start
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
For more information on possible values, see Section 5.1.7, “Configuration File”
Starting a Volume
Modify the [volume] section in the configuration file to start a volume. For example:
[volume]
action=start
volname=10.0.0.1:glustervol
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Stopping a Volume
Modify the [volume] section in the configuration file to start a volume. For example:
[volume]
action=stop
volname=10.0.0.1:glustervol
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Deleting a Volume
Modify the [volume] section in the configuration file to start a volume. For example:
[volume]
action=delete
volname=10.70.46.13:glustervol
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
For more information on possible values, see Section 5.1.7, “Configuration File”
35
Administration Guide
The configuration file includes the various options that can be used to change the settings for gdeploy.
The following options are currently supported:
With the new release of gdeploy the configuration file has added many more sections and has
enhanced the variables in the existing sections.
[hosts]
[devices]
[disktype]
[diskcount]
[stripesize]
[vgs]
[pools]
[lvs]
[mountpoints]
{host-specific-data-for-above}
[clients]
[volume]
[backend-setup]
[pv]
[vg]
[lv]
[RH-subscription]
[yum]
[shell]
[update-file]
[service]
[script]
[firewalld]
hosts
This is a mandatory section which contains the IP address or hostname of the machines in the
trusted storage pool. Each hostname or IP address should be listed in a separate line.
36
CHAPTER 5. SETTING UP STORAGE VOLUMES
For example:
[hosts]
10.0.0.1
10.0.0.2
devices
This is a generic section and is applicable to all the hosts listed in the [hosts] section. However,
if sections of hosts such as the [hostname] or [IP-address] is present, then the data in the
generic sections like [devices] is ignored. Host specific data take precedence. This is an
optional section.
For example:
[devices]
/dev/sda
/dev/sdb
NOTE
When configuring the backend setup, the devices should be either listed in this
section or in the host specific section.
disktype
This section specifies the disk configuration that is used while setting up the backend. gdeploy
supports RAID 10, RAID 6, RAID 5, and JBOD configurations. This is an optional section and if
the field is left empty, JBOD is taken as the default configuration. Valid values for this field are
raid10, raid6, raid5, and jbod.
For example:
[disktype]
raid6
diskcount
This section specifies the number of data disks in the setup. This is a mandatory field if a RAID
disk type is specified under [disktype]. If the [disktype] is JBOD the [diskcount] value is
ignored. This parameter is host specific.
For example:
[diskcount]
10
stripesize
This section specifies the stripe_unit size in KB.
Case 1: This field is not necessary if the [disktype] is JBOD, and any given value will be ignored.
37
Administration Guide
For [disktype] RAID 10, the default value is taken as 256KB. Red Hat does not recommend
changing this value. If you specify any other value the following warning is displayed:
NOTE
Do not add any suffixes like K, KB, M, etc. This parameter is host specific and can
be added in the hosts section.
For example:
[stripesize]
128
vgs
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for
gdeploy 2.0. This section specifies the volume group names for the devices listed in [devices].
The number of volume groups in the [vgs] section should match the one in [devices]. If the
volume group names are missing, the volume groups will be named as GLUSTER_vg{1, 2, 3, ...}
as default.
For example:
[vgs]
CUSTOM_vg1
CUSTOM_vg2
pools
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for
gdeploy 2.0. This section specifies the pool names for the volume groups specified in the [vgs]
section. The number of pools listed in the [pools] section should match the number of volume
groups in the [vgs] section. If the pool names are missing, the pools will be named as
GLUSTER_pool{1, 2, 3, ...}.
For example:
[pools]
CUSTOM_pool1
CUSTOM_pool2
lvs
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for
gdeploy 2.0. This section provides the logical volume names for the volume groups specified in
[vgs]. The number of logical volumes listed in the [lvs] section should match the number of
volume groups listed in [vgs]. If the logical volume names are missing, it is named as
GLUSTER_lv{1, 2, 3, ...}.
For example:
38
CHAPTER 5. SETTING UP STORAGE VOLUMES
[lvs]
CUSTOM_lv1
CUSTOM_lv2
mountpoints
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for
gdeploy 2.0. This section specifies the brick mount points for the logical volumes. The number
of mount points should match the number of logical volumes specified in [lvs] If the mount
points are missing, the mount points will be names as /gluster/brick{1, 2, 3…}.
For example:
[mountpoints]
/rhgs/brick1
/rhgs/brick2
brick_dirs
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for
gdeploy 2.0. This is the directory which will be used as a brick while creating the volume. A
mount point cannot be used as a brick directory, hence brick_dir should be a directory inside
the mount point.
This field can be left empty, in which case a directory will be created inside the mount point
with a default name. If the backend is not setup, then this field will be ignored. In case mount
points have to be used as brick directory, then use the force option in the volume section.
IMPORTANT
If you only want to create a volume and not setup the back-end, then provide
the absolute path of brick directories for each host specified in the [hosts]
section under this section along with the volume section.
For example:
[brick_dirs]
/rhgs/brick1
/rhgs/brick2
host-specific-data
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for
gdeploy 2.0. For the hosts (IP/hostname) listed under [hosts] section, each host can have its
own specific data. The following are the variables that are supported for hosts.
For example:
39
Administration Guide
[10.0.01]
devices=/dev/vdb,/dev/vda
vgs=CUSTOM_vg1,CUSTOM_vg2
pools=CUSTOM_pool1,CUSTOM_pool1
lvs=CUSTOM_lv1,CUSTOM_lv2
mountpoints=/rhgs/brick1,/rhgs/brick2
brick_dirs=b1,b2
peer
This section specifies the configurations for the Trusted Storage Pool management (TSP). This
section helps in making all the hosts specified in the [hosts] section to either probe each other
to create the trusted storage pool or detach all of them from the trusted storage pool. The only
option in this section is the option names 'action' which can have it's values to be either probe
or detach.
For example:
[peer]
action=probe
clients
This section specifies the client hosts and client_mount_points to mount the gluster storage
volume created. The 'action' option is to be specified for the framework to determine the
action that has to be performed. The options are 'mount' and 'unmount'. The Client hosts field
is mandatory. If the mount points are not specified, default will be taken as /mnt/gluster for all
the hosts.
The option fstype specifies how the gluster volume is to be mounted. Default is glusterfs
(FUSE mount). The volume can also be mounted as NFS. Each client can have different types of
volume mount, which has to be specified with a comma separated. The following fields are
included:
* action
* hosts
* fstype
* client_mount_points
For example:
[clients]
action=mount
hosts=10.0.0.10
fstype=nfs
nfs-version=3
client_mount_points=/mnt/rhs
volume
The section specifies the configuration options for the volume. The following fields are
included in this section:
* action
* volname
40
CHAPTER 5. SETTING UP STORAGE VOLUMES
* transport
* replica
* replica_count
* disperse
* disperse_count
* redundancy_count
* force
action
This option specifies what action must be performed in the volume. The choices can be
[create, delete, add-brick, remove-brick].
delete: If the delete choice is used, all the options other than 'volname' will be ignored.
volname
This option specifies the volume name. Default name is glustervol
NOTE
transport
This option specifies the transport type. Default is tcp. Options are tcp or rdma or tcp,rdma.
replica
This option will specify if the volume should be of type replica. options are yes and no.
Default is no. If 'replica' is provided as yes, the 'replica_count' should be provided.
disperse
This option specifies if the volume should be of type disperse. Options are yes and no.
Default is no.
disperse_count
This field is optional even if 'disperse' is yes. If not specified, the number of bricks specified
in the command line is taken as the disperse_count value.
redundancy_count
If this value is not specified, and if 'disperse' is yes, it's default value is computed so that it
generates an optimal configuration.
force
41
Administration Guide
This is an optional field and can be used during volume creation to forcefully create the
volume.
For example:
[volname]
action=create
volname=glustervol
transport=tcp,rdma
replica=yes
replica_count=3
force=yes
backend-setup
Available in gdeploy 2.0. This section sets up the backend for using with GlusterFS volume. If
more than one backend-setup has to be done, they can be done by numbering the section like
[backend-setup1], [backend-setup2], ...
devices: This replaces the [pvs] section in gdeploy 1.x. devices variable lists the raw disks
which should be used for backend setup. For example:
[backend-setup]
devices=sda,sdb,sdc
dalign:
The Logical Volume Manager can use a portion of the physical volume for storing its
metadata while the rest is used as the data portion. Align the I/O at the Logical Volume
Manager (LVM) layer using the dalign option while creating the physical volume. For
example:
[backend-setup]
devices=sdb,sdc,sdd,sde
dalign=256k
For JBOD, use an alignment value of 256K. For hardware RAID, the alignment value should
be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12
disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand,
if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.
The following example is appropriate for 12 disks in a RAID 6 configuration with a stripe
unit size of 128 KiB:
[backend-setup]
devices=sdb,sdc,sdd,sde
dalign=1280k
The following example is appropriate for 12 disks in a RAID 10 configuration with a stripe
unit size of 256 KiB:
42
CHAPTER 5. SETTING UP STORAGE VOLUMES
[backend-setup]
devices=sdb,sdc,sdd,sde
dalign=1536k
To view the previously configured physical volume settings for the dalign option, run the
pvs -o +pe_start device command. For example:
vgs: This is an optional variable. This variable replaces the [vgs] section in gdeploy 1.x. vgs
variable lists the names to be used while creating volume groups. The number of VG names
should match the number of devices or should be left blank. gdeploy will generate names
for the VGs. For example:
[backend-setup]
devices=sda,sdb,sdc
vgs=custom_vg1,custom_vg2,custom_vg3
A pattern can be provided for the vgs like custom_vg{1..3}, this will create three vgs.
[backend-setup]
devices=sda,sdb,sdc
vgs=custom_vg{1..3}
pools: This is an optional variable. The variable replaces the [pools] section in gdeploy 1.x.
pools lists the thin pool names for the volume.
[backend-setup]
devices=sda,sdb,sdc
vgs=custom_vg1,custom_vg2,custom_vg3
pools=custom_pool1,custom_pool2,custom_pool3
Similar to vg, pattern can be provided for thin pool names. For example custom_pool{1..3}
lvs: This is an optional variable. This variable replaces the [lvs] section in gdeploy 1.x. lvs
lists the logical volume name for the volume.
[backend-setup]
devices=sda,sdb,sdc
vgs=custom_vg1,custom_vg2,custom_vg3
pools=custom_pool1,custom_pool2,custom_pool3
lvs=custom_lv1,custom_lv2,custom_lv3
43
Administration Guide
[backend-setup]
devices=sda,sdb,sdc
vgs=custom_vg1,custom_vg2,custom_vg3
pools=custom_pool1,custom_pool2,custom_pool3
lvs=custom_lv1,custom_lv2,custom_lv3
mountpoints=/gluster/data1,/gluster/data2,/gluster/data3
ssd - This variable is set if caching has to be added. For example, the backed setup with ssd
for caching should be:
[backend-setup]
ssd=sdc
vgs=RHS_vg1
datalv=lv_data
cachedatalv=lv_cachedata:1G
cachemetalv=lv_cachemeta:230G
NOTE
Specifying the name of the data LV is necessary while adding SSD. Make
sure the datalv is created already. Otherwise ensure to create it in one of
the earlier `backend-setup’ sections.
PV
Available in gdeploy 2.0. If the user needs to have more control over setting up the backend,
and does not want to use backend-setup section, then pv, vg, and lv modules are to be used.
The pv module supports the following variables.
[pv]
action=create
devices=vdb,vdc,vdd
[pv:10.0.5.2]
action=create
devices=vdb,vdc,vdd
[pv]
action=resize
devices=vdb
expand=yes
44
CHAPTER 5. SETTING UP STORAGE VOLUMES
[pv]
action=resize
devices=vdb
shrink=100G
dalign:
The Logical Volume Manager can use a portion of the physical volume for storing its
metadata while the rest is used as the data portion. Align the I/O at the Logical Volume
Manager (LVM) layer using the dalign option while creating the physical volume. For
example:
[pv]
action=create
devices=sdb,sdc,sdd,sde
dalign=256k
For JBOD, use an alignment value of 256K. For hardware RAID, the alignment value should
be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12
disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand,
if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.
The following example is appropriate for 12 disks in a RAID 6 configuration with a stripe
unit size of 128 KiB:
[pv]
action=create
devices=sdb,sdc,sdd,sde
dalign=1280k
The following example is appropriate for 12 disks in a RAID 10 configuration with a stripe
unit size of 256 KiB:
[pv]
action=create
devices=sdb,sdc,sdd,sde
dalign=1536k
To view the previously configured physical volume settings for the dalign option, run the
pvs -o +pe_start device command. For example:
You can also set the dalign option in the backend-setup section.
VG
Available in gdeploy 2.0. This module is used to create and extend volume groups. The vg
module supports the following variables.
45
Administration Guide
pvname - PVs to use to create the volume. For more than one PV use comma separated
values.
vgname - The name of the vg. If no name is provided GLUSTER_vg will be used as default
name.
one-to-one - If set to yes, one-to-one mapping will be done between pv and vg.
[vg]
action=create
vgname=images_vg
pvname=sdb,sdc
Example2: Create two vgs named rhgs_vg1 and rhgs_vg2 with two PVs
[vg]
action=create
vgname=rhgs_vg
pvname=sdb,sdc
one-to-one=yes
[vg]
action=extend
vgname=rhgs_images
pvname=sdc
LV
Available in gdeploy 2.0. This module is used to create, setup-cache, and convert logical
volumes. The lv module supports the following variables:
action - The action variable allows three values `create’, `setup-cache’, `convert’, and `change’.
If the action is 'create', the following options are supported:
lvname: The name of the logical volume, this is an optional field. Default is GLUSTER_lv
poolname - Name of the thinpool volume name, this is an optional field. Default is
GLUSTER_pool
lvtype - Type of the logical volume to be created, allowed values are `thin’ and `thick’. This
is an optional field, default is thick.
size - Size of the logical volume volume. Default is to take all available space on the vg.
force - Force lv create, do not ask any questions. Allowed values `yes’, `no’. This is an
optional field, default is yes.
46
CHAPTER 5. SETTING UP STORAGE VOLUMES
chunksize - The size of the chunk unit used for snapshots, cache pools, and thin pools. By
default this is specified in kilobytes. For RAID 5 and 6 volumes, gdeploy calculates the
default chunksize by multiplying the stripe size and the disk count. For RAID 10, the default
chunksize is 256 KB. See Section 21.2, “Brick Configuration” for details.
WARNING
virtualsize - Creates a thinly provisioned device or a sparse device of the given size
ssd - Name of the ssd device. For example sda/vda/ … to setup cache.
cache_meta_lv - Due to requirements from dm-cache (the kernel driver), LVM further
splits the cache pool LV into two devices - the cache data LV and cache metadata LV.
Provide the cache_meta_lv name here.
force - Force
lvtype - type of the lv, available options are thin and thick
47
Administration Guide
cachepool - This argument is necessary when converting a logical volume to a cache LV.
Name of the cachepool.
chunksize - The size of the chunk unit used for snapshots, cache pools, and thin pools. By
default this is specified in kilobytes. For RAID 5 and 6 volumes, gdeploy calculates the
default chunksize by multiplying the stripe size and the disk count. For RAID 10, the default
chunksize is 256 KB. See Section 21.2, “Brick Configuration” for details.
WARNING
thinpool - Specifies or converts logical volume into a thin pool's data volume. Volume’s
name or path has to be given.
[lv]
action=create
vgname=RHGS_vg1
poolname=lvthinpool
lvtype=thinpool
poolmetadatasize=200MB
chunksize=1024k
size=30GB
48
CHAPTER 5. SETTING UP STORAGE VOLUMES
[lv]
action=create
vgname=RHGS_vg1
lvname=engine_lv
lvtype=thick
size=10GB
mount=/rhgs/brick1
If there are more than one LVs, then the LVs can be created by numbering the LV sections, like
[lv1], [lv2] …
RH-subscription
Available in gdeploy 2.0. This module is used to subscribe, unsubscribe, attach, enable repos
etc. The RH-subscription module allows the following variables:
This module is used to subscribe, unsubscribe, attach, enable repos etc. The RH-subscription
module allows the following variables:
auto-attach: true/false
disable-repos: Repo names to disable. Leaving this option blank will disable all the repos.
49
Administration Guide
[RH-subscription1]
action=register
username=qa@redhat.com
password=<passwd>
pool=<pool>
ignore_register_errors=no
[RH-subscription2]
action=disable-repos
repos=
[RH-subscription3]
action=enable-repos
repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rhel-7-
server-rhev-mgmt-agent-rpms
ignore_enable_errors=no
yum
Available in gdeploy 2.0. This module is used to install or remove rpm packages, with the yum
module we can add repos as well during the install time.
For example
[yum1]
action=install
gpgcheck=no
# Repos should be an url; eg: http://repo-pointing-glusterfs-builds
repos=<glusterfs.repo>,<vdsm.repo>
packages=vdsm,vdsm-gluster,ovirt-hosted-engine-setup,screen,gluster-
nagios-addons,xauth
update=yes
50
CHAPTER 5. SETTING UP STORAGE VOLUMES
[yum2:host1]
action=install
gpgcheck=no
packages=rhevm-appliance
shell
Available in gdeploy 2.0. This module allows user to run shell commands on the remote nodes.
Currently shell provides a single action variable with value execute. And a command variable
with any valid shell command as value.
[shell]
action=execute
command=vdsm-tool configure --force
update-file
Available in gdeploy 2.0. update-file module allows users to copy a file, edit a line in a file, or
add new lines to a file. action variable can be any of copy, edit, or add.
When the action variable is set to copy, the following variables are supported.
dest - The destination path on the remote machine to where the file is to be copied to.
When the action variable is set to edit, the following variables are supported.
replace - A regular expression, which will match a line that will be replaced.
When the action variable is set to add, the following variables are supported.
line - Line which has to be added to the file. Line will be added towards the end of the file.
[update-file]
action=copy
src=/tmp/foo.cfg
dest=/etc/nagios/nrpe.cfg
Example 2: Edit a line in the remote machine, in the below example lines that have
allowed_hosts will be replaced with allowed_hosts=host.redhat.com
[update-file]
action=edit
dest=/etc/nagios/nrpe.cfg
51
Administration Guide
replace=allowed_hosts
line=allowed_hosts=host.redhat.com
[update-file]
action=add
dest=/etc/ntp.conf
line=server clock.redhat.com iburst
service
Available in gdeploy 2.0. The service module allows user to start, stop, restart, reload, enable,
or disable a service. The action variable specifies these values.
When action variable is set to any of start, stop, restart, reload, enable, disable the variable
servicename specifies which service to start, stop etc.
[service1]
action=enable
service=ntpd
[service2]
action=restart
service=ntpd
script
Available in gdeploy 2.0. script module enables user to execute a script/binary on the remote
machine. action variable is set to execute. Allows user to specify two variables file and args.
Example: Execute script disable-multipath.sh on all the remote nodes listed in `hosts’ section.
[script]
action=execute
file=/usr/share/ansible/gdeploy/scripts/disable-multipath.sh
firewalld
Available in gdeploy 2.0. firewalld module allows the user to manipulate firewall rules. action
variable supports two values `add’ and `delete’. Both add and delete support the following
variables:
permanent - Whether to make the entry permanent. Allowed values are true/false
52
CHAPTER 5. SETTING UP STORAGE VOLUMES
For example:
[firewalld]
action=add
ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-
6923/tcp,5666/tcp,16514/tcp
services=glusterfs
NFS-Ganesha is a user space file server for the NFS protocol. For more information about NFS-Ganesha
see https://access.redhat.com/documentation/en-
us/red_hat_gluster_storage/3.3/html/administration_guide/sect-nfs#sect-NFS_Ganesha
5.1.8.1. Prerequisites
Add the following details to the configuration file to subscribe to subscription manager:
[RH-subscription1]
action=register
username=<user>@redhat.com
password=<password>
pool=<pool-id>
# gdeploy -c <config_file_name>
Enabling Repos
To enable the required repos, add the following details in the configuration file:
[RH-subscription2]
action=enable-repos
repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rh-gluster-3-
nfs-for-rhel-7-server-rpms,rhel-ha-for-rhel-7-server-rpms
# gdeploy -c <config_file_name>
53
Administration Guide
[firewalld]
action=add
ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp
services=glusterfs,nlm,nfs,rpc-bind,high-availability,mountd,rquota
NOTE
To ensure NFS client UDP mount does not fail, ensure to add port 2049/udp in
[firewalld] section of gdeploy.
# gdeploy -c <config_file_name>
[yum]
action=install
repolist=
gpgcheck=no
update=no
packages=glusterfs-ganesha
# gdeploy -c <config_file_name>
The NFS Ganesha module in gdeploy allows the user to perform the following actions:
Creating a Cluster
Destroying a Cluster
Adding a Node
Exporting a Volume
Unexporting a Volume
Creating a Cluster
This action creates a fresh NFS-Ganesha setup on a given volume. For this action the nfs-ganesha in
the configuration file section supports the following variables:
cluster-nodes: This is a required argument. This variable expects comma separated values of
cluster node names, which is used to form the cluster.
54
CHAPTER 5. SETTING UP STORAGE VOLUMES
vip: This is a required argument. This variable expects comma separated list of ip addresses.
These will be the virtual ip addresses.
volname: This is an optional variable if the configuration contains the [volume] section
For example: To create a NFS-Ganesha cluster add the following details in the configuration file:
[hosts]
host-1.example.com
host-2.example.com
[backend-setup]
devices=/dev/vdb
vgs=vg1
pools=pool1
lvs=lv1
mountpoints=/mnt/brick
[firewalld]
action=add
ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-
6923/tcp,5666/tcp,16514/tcp,662/tcp,662/udp
services=glusterfs,nlm,nfs,rpc-bind,high-availability,mountd,rquota
[volume]
action=create
volname=ganesha
transport=tcp
replica_count=2
force=yes
In the above example, it is assumed that the required packages are installed, a volume is created and
NFS-Ganesha is enabled on it.
If you have upgraded to Red Hat Enterprise Linux 7.4, then enable the gluster_use_execmem
boolean by executing the following command:
# setsebool -P gluster_use_execmem on
# gdeploy -c <config_file_name>
Destroying a Cluster
The action, destroy-cluster cluster disables NFS Ganesha. It allows one variable, cluster-nodes.
For example: To destroy a NFS-Ganesha cluster add the following details in the configuration file:
55
Administration Guide
[hosts]
host-1.example.com
host-2.example.com
[nfs-ganesha]
action=destroy-cluster
cluster-nodes=host-1.example.com,host-2.example.com
# gdeploy -c <config_file_name>
Adding a Node
The add-node action allows three variables:
nodes: Accepts a list of comma separated hostnames that have to be added to the cluster
cluster_nodes: Accepts a list of comma separated nodes of the NFS Ganesha cluster.
For example, to add a node, add the following details to the configuration file:
[hosts]
host-1.example.com
host-2.example.com
host-3.example.com
[peer]
action=probe
[clients]
action=mount
volname=gluster_shared_storage
hosts=host-3.example.com
fstype=glusterfs
client_mount_points=/var/run/gluster/shared_storage/
[nfs-ganesha]
action=add-node
nodes=host-3.example.com
cluster_nodes=host-1.example.com,host-2.example.com
vip=10.0.0.33
# gdeploy -c <config_file_name>
56
CHAPTER 5. SETTING UP STORAGE VOLUMES
NOTE
To delete a node, refer to Deleting a node in the clusterunder Section 6.2.3.4.2, “Deleting
a Node in the Cluster”.
Exporting a Volume
This action exports a volume. export-volume action supports one variable, volname.
For example, to export a volume, add the following details to the configuration file:
[hosts]
host-1.example.com
host-2.example.com
[nfs-ganesha]
action=export-volume
volname=ganesha
# gdeploy -c <config_file_name>
Unexporting a Volume:
This action unexports a volume. unexport-volume action supports one variable, volname.
For example, to unexport a volume, add the following details to the configuration file:
[hosts]
host-1.example.com
host-2.example.com
[nfs-ganesha]
action=unexport-volume
volname=ganesha
# gdeploy -c <config_file_name>
del-config-lines
block-name
volname
ha-conf-dir
57
Administration Guide
Example 1 - To add a client block and run refresh-config add the following details to the configuration
file:
NOTE
If a client block already exists, then user has to manually delete it before doing
any other modifications.
[hosts]
host1-example.com
host2-example.com
[nfs-ganesha]
action=refresh-config
# Default block name is `client'
block-name=client
config-block=clients = 10.0.0.1;|allow_root_access = true;|access_type =
"RO";|Protocols = "2", "3";|anonymous_uid = 1440;|anonymous_gid = 72;
volname=ganesha
# gdeploy -c <config_file_name>
Example 2 - To delete a line and run refresh-config add the following details to the configuration file:
[hosts]
host1-example.com
host2-example.com
[nfs-ganesha]
action=refresh-config
del-config-lines=client
volname=ganesha
# gdeploy -c <config_file_name>
Example 3 - To run refresh-config on a volume add the following details to the configuration file:
[hosts]
host1-example.com
host2-example.com
58
CHAPTER 5. SETTING UP STORAGE VOLUMES
[nfs-ganesha]
action=refresh-config
volname=ganesha
# gdeploy -c <config_file_name>
5.1.9.1. Prerequisites
Add the following details to the configuration file to subscribe to subscription manager:
[RH-subscription1]
action=register
username=<user>@redhat.com
password=<password>
pool=<pool-id>
# gdeploy -c <config_file_name>
Enabling Repos
To enable the required repos, add the following details in the configuration file:
[RH-subscription2]
action=enable-repos
repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rh-gluster-3-
samba-for-rhel-7-server-rpms
# gdeploy -c <config_file_name>
[firewalld]
59
Administration Guide
action=add
ports=54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,4379/tcp
services=glusterfs,samba,high-availability
# gdeploy -c <config_file_name>
[yum]
action=install
repolist=
gpgcheck=no
update=no
packages=samba,samba-client,glusterfs-server,ctdb
# gdeploy -c <config_file_name>
For example, to enable Samba on an existing volume, add the following details to the configuration file:
[hosts]
10.70.37.192
10.70.37.88
[volume]
action=smb-setup
volname=samba1
force=yes
smb_username=smbuser
smb_mountpoint=/mnt/smb
NOTE
Ensure that the hosts are not part of the CTDB cluster.
60
CHAPTER 5. SETTING UP STORAGE VOLUMES
# gdeploy -c <config_file_name>
For example, to enable Samba while creating a volume, add the following details to the configuration
file:
[hosts]
10.70.37.192
10.70.37.88
[backend-setup]
devices=/dev/vdb
vgs=vg1
pools=pool1
lvs=lv1
mountpoints=/mnt/brick
[volume]
action=create
volname=samba1
smb=yes
force=yes
smb_username=smbuser
smb_mountpoint=/mnt/smb
# gdeploy -c <config_file_name>
NOTE
Using CTDB requires setting up a separate volume in order to protect the CTDB lock file. Red Hat
recommends a replicated volume where the replica count is equal to the number of servers being used
as Samba servers.
The following configuration file sets up a CTDB volume across two hosts that are also Samba servers.
[hosts]
10.70.37.192
10.70.37.88
[volume]
action=create
61
Administration Guide
volname=ctdb
transport=tcp
replica_count=2
force=yes
[ctdb]
action=setup
public_address=10.70.37.6/24 eth0,10.70.37.8/24 eth0
volname=ctdb
You can configure the CTDB cluster to use separate IP addresses by using the ctdb_nodes
parameter, as shown in the following example.
[hosts]
10.70.37.192
10.70.37.88
[volume]
action=create
volname=ctdb
transport=tcp
replica_count=2
force=yes
[ctdb]
action=setup
public_address=10.70.37.6/24 eth0,10.70.37.8/24 eth0
ctdb_nodes=192.168.1.1,192.168.2.5
volname=ctdb
# gdeploy -c <config_file_name>
To create a volume and enable SSL on it, add the following details to the configuration file:
[hosts]
10.70.37.147
10.70.37.47
[backend-setup]
devices=/dev/vdb
vgs=vg1
pools=pool1
lvs=lv1
mountpoints=/mnt/brick
62
CHAPTER 5. SETTING UP STORAGE VOLUMES
[volume]
action=create
volname=vol1
transport=tcp
replica_count=2
force=yes
enable_ssl=yes
ssl_clients=10.70.37.107,10.70.37.173
brick_dirs=/data/1
[clients]
action=mount
hosts=10.70.37.173,10.70.37.107
volname=vol1
fstype=glusterfs
client_mount_points=/mnt/data
In the above example, a volume named vol1 is created and SSL is enabled on it. gdeploy creates self
signed certificates.
After adding the details to the configuration file, execute the following command to run the
configuration file:
# gdeploy -c <config_file_name>
To enable SSL on an existing volume, add the following details to the configuration file:
[hosts]
10.70.37.147
10.70.37.47
[volume]
action=enable-ssl
volname=vol2
ssl_clients=10.70.37.107,10.70.37.173
[clients2]
action=mount
hosts=10.70.37.173,10.70.37.107
volname=vol2
fstype=glusterfs
client_mount_points=/mnt/data
After adding the details to the configuration file, execute the following command to run the
configuration file:
# gdeploy -c <config_file_name>
63
Administration Guide
To limit the resources available to glusterd on a Red Hat Enterpise Linux 7 based installation of Red
Hat Gluster Storage 3.2 or higher, define slice_setup=yes when you start the glusterd service. This
applies a set of resource limitations for the glusterd service and all of its child processes.
[hosts]
192.168.100.101
192.168.100.102
192.168.100.103
[service]
action=start
service=glusterd
slice_setup=yes
The resource limitations set cannot be customized using gdeploy, but they can be manually modified
outside the scope of gdeploy, for example, by using systemctl.
If you use a version of Red Hat Gluster Storage that is based on Red Hat Enterprise Linux 6, you cannot
set up resource management using gdeploy. See Chapter 20, Managing Resource Usage for details.
For more information about resource management, see the Red Hat Enterprise Linux Resource
Management Guide:
You can change the log location by setting a different location as the value of the GDEPLOY_LOGFILE
environment variable. For example, to set the gdeploy log location to
/var/log/gdeploy/gdeploy.log for this session, run the following command:
$ export GDEPLOY_LOGFILE=/var/log/gdeploy/gdeploy.log
To persistently set this as the default log location for this user, add the same command as a separate
line in the /home/username/.bash_profile file for that user.
64
CHAPTER 5. SETTING UP STORAGE VOLUMES
Heketi provides a RESTful management interface which can be used to manage the lifecycle of Red Hat
Gluster Storage volumes. With Heketi, cloud services like OpenStack Manila, Kubernetes, and
OpenShift can dynamically provision Red Hat Gluster Storage volumes with any of the supported
durability types. Heketi will automatically determine the location for bricks across the cluster, making
sure to place bricks and its replicas across different failure domains. Heketi also supports any number
of Red Hat Gluster Storage clusters, allowing cloud services to provide network file storage without
being limited to a single Red Hat Gluster Storage cluster.
With Heketi, the administrator no longer manages or configures bricks, disks, or trusted storage pools.
Heketi service will manage all hardware for the administrator, enabling it to allocate storage on
demand. Any disks registered with Heketi must be provided in raw format, which will then be managed
by it using LVM on the disks provided.
NOTE
The replica 3 volume type is the default and the only supported volume type that can be
created using Heketi.
Heketi can be configured and executed using the CLI or the API. The sections ahead describe
configuring Heketi using the CLI.
5.2.1. Prerequisites
65
Administration Guide
Heketi requires SSH access to the nodes that it will manage. Hence, ensure that the following
requirements are met:
SSH Access
Must be able to run sudo commands from SSH. This requires disabling requiretty in the
/etc/sudoers file
Start the glusterd service after Red Hat Gluster Storage is installed.
NOTE
After installing Red Hat Gluster Storage 3.3, execute the following command to install the heketi-
client:
heketi-client has the binary for the heketi command line tool.
For more information about subscribing to the required channels and installing Red Hat Gluster
Storage, see the Red Hat Gluster Storage Installation Guide.
Generate the passphrase-less SSH keys for the nodes which are going to be part of the trusted
storage pool by running the following command:
Change the owner and the group permissions for the heketi keys using the following
command:
Set up password-less SSH access between Heketi and the Red Hat Gluster Storage servers by
running the following command:
66
CHAPTER 5. SETTING UP STORAGE VOLUMES
Setup the heketi.json configuration file. The file is located in /etc/heketi/heketi.json. The
configuration file has the information required to run the Heketi server. The config file must be
in JSON format with the following settings:
key: string,
user: map, Settings for the Heketi volume requests access user
key: string, t
executor: string, Determines the type of command executor to use. Possible values are:
mock: Does not send any commands out to servers. Can be used for development
and tests
{
"_port_comment": "Heketi Server Port Number",
"port": "8080",
67
Administration Guide
},
"_loglevel_comment": [
"Set log level. Choices are:",
" none, critical, error, warning, info, debug",
"Default is warning"
],
"loglevel" : "debug"
}
}
68
CHAPTER 5. SETTING UP STORAGE VOLUMES
NOTE
The location for the private SSH key that is created must be set in the keyfile
setting of the configuration file, and the key should be readable by the heketi
user.
3. To check the status of the Heketi server, execute the following command:
# journalctl -u heketi
NOTE
After Heketi is configured to manage the trusted storage pool, gluster commands
should not be run on it, as this will make the heketidb inconsistent, leading to
unexpected behaviors with Heketi.
If Heketi is not setup with authentication, then use curl to verify the configuration:
# curl http://<server:port>/hello
You can also verify the configuration using the heketi-cli when authentication is enabled:
5.2.4.1. Prerequisites
69
Administration Guide
You have to determine the node failure domains and clusters of nodes. Failure domains is a value given
to a set of nodes which share the same switch, power supply, or anything else that would cause them to
fail at the same time. Heketi uses this information to make sure that replicas are created across failure
domains, thus providing cloud services volumes which are resilient to both data unavailability and data
loss.
You have to determine which nodes would constitute a cluster. Heketi supports multiple Red Hat
Gluster Storage clusters, which gives cloud services the option of specifying a set of clusters where a
volume must be created. This provides cloud services and administrators the option of creating SSD,
SAS, SATA, or any other type of cluster which provide a specific quality of service to users.
NOTE
Heketi does not have a mechanism today to study and build its database from an
existing system. So, a new trusted storage pool has to be configured that can be used by
Heketi.
The command line client loads the information about creating a cluster, adding nodes to that cluster,
and then adding disks to each one of those nodes.This information is added into the topology file. To
load a topology file with heketi-cli, execute the following command:
NOTE
# export HEKETI_CLI_SERVER=http://<heketi_server:port>
# heketi-cli topology load --json=<topology_file>
Where topology_file is a file in JSON format describing the clusters, nodes, and disks to add to
Heketi. The format of the file is as follows:
Each element on the array is a map which describes the cluster as follows
Each element on the array is a map which describes the node as follows
node: Same as Node Add, except there is no need to supply the cluster ID.
zone: The value represents failure domain on which the node exists.
For example:
1. Topology file:
{
"clusters": [
70
CHAPTER 5. SETTING UP STORAGE VOLUMES
{
"nodes": [
{
"node": {
"hostnames": {
"manage": [
"10.0.0.1"
],
"storage": [
"10.0.0.1"
]
},
"zone": 1
},
"devices": [
"/dev/sdb",
"/dev/sdc",
"/dev/sdd",
"/dev/sde",
"/dev/sdf",
"/dev/sdg",
"/dev/sdh",
"/dev/sdi"
]
},
{
"node": {
"hostnames": {
"manage": [
"10.0.0.2"
],
"storage": [
"10.0.0.2"
]
},
"zone": 2
},
"devices": [
"/dev/sdb",
"/dev/sdc",
"/dev/sdd",
"/dev/sde",
"/dev/sdf",
"/dev/sdg",
"/dev/sdh",
"/dev/sdi"
]
},
.......
.......
71
Administration Guide
1. Execute the following command to check the various option for creating a volume:
72
CHAPTER 5. SETTING UP STORAGE VOLUMES
2. For example: After setting up the topology file with two nodes on one failure domain, and two
nodes in another failure domain, create a 100Gb volume using the following command:
Bricks:
Id: 8998961142c1b51ab82d14a4a7f4402d
Path:
/var/lib/heketi/mounts/vg_0ddba53c70537938f3f06a65a4a7e88b/brick_899
8961142c1b51ab82d14a4a7f4402d/brick
Size (GiB): 50
Node: b455e763001d7903419c8ddd2f58aea0
Device: 0ddba53c70537938f3f06a65a4a7e88b
…………….
73
Administration Guide
2. This volume id can be used as input to heketi-cli for expanding the volume.
For example:
For example:
For information on creating encrypted disk, refer to the Disk Encryption Appendix of the Red Hat
Enterprise Linux 6 Installation Guide.
74
CHAPTER 5. SETTING UP STORAGE VOLUMES
IMPORTANT
Red Hat supports formatting a Logical Volume using the XFS file system on the
bricks.
To create a thinly provisioned logical volume, proceed with the following steps:
For example:
Use the correct dataalignment option based on your device. For more information, see
Section 21.2, “Brick Configuration”
NOTE
The device name and the alignment value will vary based on the device you are
using.
2. Create a Volume Group (VG) from the PV using the vgcreate command:
For example:
For example:
To enhance the performance of Red Hat Gluster Storage, ensure you read Chapter 21, Tuning
for Performance chapter.
4. Create a thinly provisioned volume that uses the previously created pool by running the
lvcreate command with the --virtualsize and --thin options:
75
Administration Guide
For example:
5. Format bricks using the supported XFS configuration, mount the bricks, and verify the bricks
are mounted correctly. To enhance the performance of Red Hat Gluster Storage, ensure you
read Chapter 21, Tuning for Performance before formatting the bricks.
IMPORTANT
Snapshots are not supported on bricks formatted with external log devices. Do
not use -l logdev=device option with mkfs.xfs command for formatting
the Red Hat Gluster Storage bricks.
DEVICE is the created thin LV. The inode size is set to 512 bytes to accommodate for the
extended attributes used by Red Hat Gluster Storage.
10. If SElinux is enabled, then the SELinux labels that has to be set manually for the bricks created
using the following commands:
For example, the /rhgs directory is the mounted file system and is used as the brick for volume
creation. However, for some reason, if the mount point is unavailable, any write continues to happen in
the /rhgs directory, but now this is under root file system.
During Red Hat Gluster Storage setup, create an XFS file system and mount it. After mounting, create a
76
CHAPTER 5. SETTING UP STORAGE VOLUMES
subdirectory and use this subdirectory as the brick for volume creation. Here, the XFS file system is
mounted as /bricks. After the file system is available, create a directory called /rhgs/brick1 and
use it for volume creation. Ensure that no more than one brick is created from a single mount. This
approach has the following advantages:
When the /rhgs file system is unavailable, there is no longer /rhgs/brick1 directory
available in the system. Hence, there will be no data loss by writing to a different location.
This does not require any additional file system for nesting.
# mkdir /rhgs/brick1
2. Create the Red Hat Gluster Storage volume using the subdirectories as bricks.
NOTE
If multiple bricks are used from the same server, then ensure the bricks are mounted in
the following format. For example:
# df -h
Create a distribute volume with 2 bricks from each server. For example:
77
Administration Guide
Run # mkfs.xfs -f -i size=512 device to reformat the brick to supported requirements, and
make it available for immediate reuse in a new volume.
NOTE
If the file system cannot be reformatted, remove the whole brick directory and create it again.
1. Delete all previously existing data in the brick, including the .glusterfs subdirectory.
3. Run # getfattr -d -m . brick to examine the attributes set on the volume. Take note of
the attributes.
4. Run # setfattr -x attribute brick to remove the attributes relating to the glusterFS
file system.
78
CHAPTER 5. SETTING UP STORAGE VOLUMES
WARNING
Distributed volumes can suffer significant data loss during a disk or server failure
because directory contents are spread randomly across the bricks in the volume.
Use distributed volumes where scalable storage and redundancy is either not
important, or is provided by other hardware or software layers.
Use gluster volume create command to create different types of volumes, and gluster
volume info command to verify successful volume creation.
Pre-requisites
A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the
Trusted Storage Pool”.
Understand how to start and stop volumes, as described in Section 5.11, “Starting Volumes”.
1. Run the gluster volume create command to create the distributed volume.
The default value for transport is tcp. Other options can be passed such as auth.allow or
auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.
79
Administration Guide
3. Run gluster volume info command to optionally display the volume information.
The following output is the result of Example 5.1, “Distributed Volume with Two Storage
Servers”.
80
CHAPTER 5. SETTING UP STORAGE VOLUMES
IMPORTANT
Creating replicated volume with replica count greater than 3 is under technology
preview. Technology Preview features are not fully supported under Red Hat service-
level agreements (SLAs), may not be functionally complete, and are not intended for
production use.
Tech Preview features provide early access to upcoming product innovations, enabling
customers to test functionality and provide feedback during the development process.
As Red Hat considers making future iterations of Technology Preview features generally
available, we will provide commercially reasonable efforts to resolve any reported issues
that customers experience when using these features.
Replicated volume creates copies of files across multiple bricks in the volume. Use replicated volumes
in environments where high-availability and high-reliability are critical.
Use gluster volume create to create different types of volumes, and gluster volume info to
verify successful volume creation.
Prerequisites
A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the
Trusted Storage Pool”.
Understand how to start and stop volumes, as described in Section 5.11, “Starting Volumes”.
81
Administration Guide
WARNING
While a dummy node can be used as an interim solution for this problem, Red Hat
recommends that all volumes that currently use two-way replication are migrated
to use either arbitrated replication or three-way replication.
Two-way replicated volume creates two copies of files across the bricks in the volume. The number of
bricks must be multiple of two for a replicated volume. To protect against server and disk failures, it is
recommended that the bricks of the volume are from different servers.
82
CHAPTER 5. SETTING UP STORAGE VOLUMES
1. Run the gluster volume create command to create the replicated volume.
The default value for transport is tcp. Other options can be passed such as auth.allow or
auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.
The order in which bricks are specified determines how they are replicated with each other.
For example, every 2 bricks, where 2 is the replica count, forms a replica set. This is
illustrated in Figure 5.3, “Illustration of a Two-way Replicated Volume” .
83
Administration Guide
3. Run gluster volume info command to optionally display the volume information.
IMPORTANT
You must set client-side quorum on replicated volumes to prevent split-brain scenarios.
For more information on setting client-side quorum, see Section 11.13.1.2, “Configuring
Client-Side Quorum”
Synchronous three-way replication is now fully supported in Red Hat Gluster Storage. It is
recommended that three-way replicated volumes use JBOD, but use of hardware RAID with three-way
replicated volumes is also supported.
1. Run the gluster volume create command to create the replicated volume.
The default value for transport is tcp. Other options can be passed such as auth.allow or
auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.
The order in which bricks are specified determines how bricks are replicated with each
other. For example, every n bricks, where 3 is the replica count forms a replica set. This is
illustrated in Figure 5.4, “Illustration of a Three-way Replicated Volume” .
84
CHAPTER 5. SETTING UP STORAGE VOLUMES
3. Run gluster volume info command to optionally display the volume information.
IMPORTANT
When sharding is enabled, files written to a volume are divided into pieces. The size of the pieces
depends on the value of the volume's features.shard-block-size parameter. The first piece is written to
a brick and given a GFID like a normal file. Subsequent pieces are distributed evenly between bricks in
the volume (sharded bricks are distributed by default), but they are written to that brick's .shard
directory, and are named with the GFID and a number indicating the order of the pieces. For example, if
a file is split into four pieces, the first piece is named GFID and stored normally. The other three pieces
are named GFID.1, GFID.2, and GFID.3 respectively. They are placed in the .shard directory and
distributed evenly between the various bricks in the volume.
Because sharding distributes files across the bricks in a volume, it lets you store files with a larger
aggregate size than any individual brick in the volume. Because the file pieces are smaller, heal
operations are faster, and geo-replicated deployments can sync the small pieces of a file that have
changed, rather than syncing the entire aggregate file.
Sharding also lets you increase volume capacity by adding bricks to a volume in an ad-hoc fashion.
Sharding has one supported use case: in the context of providing Red Hat Gluster Storage as a storage
domain for Red Hat Enterprise Virtualization, to provide storage for live virtual machine images. Note
that sharding is also a requirement for this use case, as it provides significant performance
improvements over previous implementations.
IMPORTANT
85
Administration Guide
IMPORTANT
1. Set up a three-way replicated volume, as described in the Red Hat Gluster Storage
Administration Guide: https://access.redhat.com/documentation/en-
US/red_hat_gluster_storage/3.3/html/Administration_Guide/sect-
Creating_Replicated_Volumes.html#Creating_Three-way_Replicated_Volumes.
Sharding is enabled and configured at the volume level. The configuration options are as follows.
features.shard
Enables or disables sharding on a specified volume. Valid values are enable and disable. The
default value is disable.
Note that this only affects files created after this command is run; files created before this
command is run retain their old behaviour.
features.shard-block-size
Specifies the maximum size of the file pieces when sharding is enabled. The supported value for this
parameter is 512MB.
Note that this only affects files created after this command is run; files created before this
command is run retain their old behaviour.
When you enable sharding, you might want to check that it is working correctly, or see how a particular
file has been sharded across your volume.
To find the pieces of a file, you need to know that file's GFID. To obtain a file's GFID, run:
86
CHAPTER 5. SETTING UP STORAGE VOLUMES
Once you have the GFID, you can run the following command on your bricks to see how this file has
been distributed:
IMPORTANT
Use distributed replicated volumes in environments where the requirement to scale storage, and high-
reliability is critical. Distributed replicated volumes also offer improved read performance in most
environments.
NOTE
The number of bricks must be a multiple of the replica count for a distributed replicated
volume. Also, the order in which bricks are specified has a great effect on data
protection. Each replica_count consecutive bricks in the list you give will form a replica
set, with all replica sets combined into a distribute set. To ensure that replica-set
members are not placed on the same node, list the first brick on every server, then the
second brick on every server in the same order, and so on.
Prerequisites
A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the
Trusted Storage Pool”.
Understand how to start and stop volumes, as described in Section 5.11, “Starting Volumes”.
87
Administration Guide
WARNING
Support for two-way replication is planned for deprecation and removal in future
versions of Red Hat Gluster Storage. This will affect both replicated and
distributed-replicated volumes.
Support is being removed because two-way replication does not provide adequate
protection from split-brain conditions. While a dummy node can be used as an
interim solution for this problem, Red Hat recommends that all volumes that
currently use two-way replication are migrated to use either arbitrated replication
or three-way replication.
Two-way distributed replicated volumes distribute and create two copies of files across the bricks in a
volume. The number of bricks must be multiple of the replica count for a replicated volume. To protect
against server and disk failures, the bricks of the volume should be from different servers.
1. Run the gluster volume create command to create the distributed replicated volume.
88
CHAPTER 5. SETTING UP STORAGE VOLUMES
The default value for transport is tcp. Other options can be passed such as auth.allow or
auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.
Example 5.6. Four Node Distributed Replicated Volume with a Two-way Replication
The order in which bricks are specified determines how they are replicated with each other.
For example, the first two bricks specified replicate each other where 2 is the replica count.
Example 5.7. Six Node Distributed Replicated Volume with a Two-way Replication
3. Run gluster volume info command to optionally display the volume information.
IMPORTANT
You must ensure to set server-side quorum and client-side quorum on the distributed-
replicated volumes to prevent split-brain scenarios. For more information on setting
quorums, see Section 11.13.1, “Preventing Split-brain”
Synchronous three-way distributed replication is now fully supported in Red Hat Gluster Storage. It is
recommended that three-way distributed replicated volumes use JBOD, but use of hardware RAID with
three-way distributed replicated volumes is also supported.
89
Administration Guide
1. Run the gluster volume create command to create the distributed replicated volume.
The default value for transport is tcp. Other options can be passed such as auth.allow or
auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.
Example 5.8. Six Node Distributed Replicated Volume with a Three-way Replication
The order in which bricks are specified determines how bricks are replicated with each
other. For example, first 3 bricks, where 3 is the replica count forms a replicate set.
3. Run gluster volume info command to optionally display the volume information.
90
CHAPTER 5. SETTING UP STORAGE VOLUMES
IMPORTANT
Better consistency
When an arbiter is configured, arbitration logic uses client-side quorum in auto mode to prevent file
operations that would lead to split-brain conditions.
Although arbitrated replicated volumes provide better data consistency than a two-way
replicated volume, because they store only metadata, they provide the same level of
availability as a two-way replicated volume. To achieve high-availability, you need to use a
three-way replicated volume instead of an arbitrated replicated volume.
Arbiters can only be configured for three-way replicated volumes. However, Red Hat Gluster
Storage can convert an existing two-way replicated volume into an arbitrated replicated
volume. See Section 5.8.5, “Converting to an arbitrated volume” for details.
91
Administration Guide
The minimum system requirements for a node that contains an arbiter brick differ depending on the
configuration choices made by the administrator. See Section 5.8.4, “Creating multiple arbitrated
replicated volumes across fewer total nodes” for details about the differences between the dedicated
arbiter and chained arbiter configurations.
Configuration Min CPU Min RAM NIC Arbiter Brick Max Latency
type Size
[a] More RAM may be necessary depending on the combined capacity of the number of arbiter bricks on the node.
[b] Arbiter and data bricks can be configured on the same device provided that the data and arbiter bricks belong to
different replica sets. See Section 5.8.1.2, “Arbiter capacity requirements” for further details on sizing arbiter volumes.
[c] Multiple bricks can be created on a single RAIDed physical device. Please refer the following product documentation:
Section 21.2, “Brick Configuration”
minimum 4 vCPUs
minimum 16 GB RAM
maximum 5 ms latency
Because an arbiter brick only stores file names and metadata, an arbiter brick can be much smaller
than the other bricks in the volume or replica set. The required size for an arbiter brick depends on the
number of files being stored on the volume.
The recommended minimum arbiter brick size can be calculated with the following formula:
For example, if you have two 1 TB data bricks, and the average size of the files is 2 GB, then the
recommended minimum size for your arbiter brick 2 MB, as shown in the following example:
92
CHAPTER 5. SETTING UP STORAGE VOLUMES
If sharding is enabled, and your shard-block-size is smaller than the average file size in KB, then you
need to use the following formula instead, because each shard also has a metadata file:
Alternatively, if you know how many files you will store in a volume, the recommended minimum
arbiter brick size is the maximum number of files multiplied by 4 KB. For example, if you expect to have
200,000 files on your volume, your arbiter brick should be at least 800,000 KB, or 0.8 GB, in size.
Red Hat also recommends overprovisioning where possible so that there is no short-term need to
increase the size of the arbiter brick.
Arbiter and 1 data brick available If the arbiter does not agree with the available data
node, write operations fail with ENOTCONN (since
the brick that is correct is not available). Other file
operations are permitted.
Arbiter down, data bricks available All file operations are permitted. The arbiter's
records are healed when it becomes available.
Only one brick available If the available brick is a data brick, client quorum is
not met, and the volume enters an EROFS state.
This creates a volume with one arbiter for every three replicate bricks. The arbiter is the last brick in
every set of three bricks.
In the following example, the bricks on server3 and server6 are the arbiter bricks.
93
Administration Guide
5.8.4. Creating multiple arbitrated replicated volumes across fewer total nodes
If you are configuring more than one arbitrated-replicated volume, or a single volume with multiple
replica sets, you can use fewer nodes in total by using either of the following techniques:
Chain multiple arbitrated replicated volumes together, by placing the arbiter brick for one
volume on the same node as a data brick for another volume. Chaining is useful for write-
heavy workloads when file size is closer to metadata file size (that is, from 32–128 KiB). This
avoids all metadata I/O going through a single disk.
In arbitrated distributed-replicated volumes, you can also place an arbiter brick on the same
node as another replica sub-volume's data brick, since these do not share the same data.
Place the arbiter bricks from multiple volumes on a single dedicated node. A dedicated arbiter
node is suited to write-heavy workloads with larger files, and read-heavy workloads.
The following commands create two arbitrated replicated volumes, firstvol and secondvol. Server3
contains the arbiter bricks of both volumes.
94
CHAPTER 5. SETTING UP STORAGE VOLUMES
Two gluster volumes configured across five servers to create two three-way arbitrated replicated
volumes, with the arbiter bricks on a dedicated arbiter node.
The following command configures an arbitrated replicated volume with six sub-volumes chained
across six servers in a 6 x (2 + 1) configuration.
Six replicated gluster sub-volumes chained across six servers to create a 6 * (2 + 1) arbitrated
distributed-replicated configuration.
95
Administration Guide
For example, if you have an existing two-way replicated volume called testvol, and a new brick for the
arbiter to use, you can add a brick as an arbiter with the following command:
If you have an existing two-way distributed-replicated volume, you need a new brick for each sub-
volume in order to convert it to an arbitrated distributed-replicated volume, for example:
For dedicated arbiter nodes, use JBOD for arbiter bricks, and RAID-6 for data bricks.
For chained arbiter volumes, use the same RAID-6 drive for both data and arbiter bricks.
See Chapter 21, Tuning for Performance for more information on enhancing performance that is not
specific to the use of arbiter volumes.
Dispersed volume requires less storage space when compared to a replicated volume. It is equivalent
to a replicated pool of size two, but requires 1.5 TB instead of 2 TB to store 1 TB of data when the
redundancy level is set to 2. In a dispersed volume, each brick stores some portions of data and parity
or redundancy. The dispersed volume sustains the loss of data based on the redundancy level.
IMPORTANT
96
CHAPTER 5. SETTING UP STORAGE VOLUMES
The data protection offered by erasure coding can be represented in simple form by the following
equation: n = k + m. Here n is the total number of bricks, we would require any k bricks out of n
bricks for recovery. In other words, we can tolerate failure up to any m bricks. With this release, the
following configurations are supported:
For optimal fault tolerance, create each brick on a separate server. Creating multiple bricks on a single
server is supported, but the more bricks there are on a single server, the greater the risk to availability
and consistency when that single server becomes unavailable.
Use gluster volume create to create different types of volumes, and gluster volume info to
verify successful volume creation.
Prerequisites
Create a trusted storage pool as described in Section 4.1, “Adding Servers to the Trusted
Storage Pool”.
Understand how to start and stop volumes, as described in Section 5.11, “Starting Volumes”.
IMPORTANT
97
Administration Guide
1. Run the gluster volume create command to create the dispersed volume.
The number of bricks required to create a disperse volume is the sum of disperse-data
count and redundancy count.
The disperse-data count option specifies the number of bricks that is part of the
dispersed volume, excluding the count of the redundant bricks. For example, if the total
number of bricks is 6 and redundancy-count is specified as 2, then the disperse-data count
is 4 (6 - 2 = 4). If the disperse-data count option is not specified, and only the
redundancy count option is specified, then the disperse-data count is computed
automatically by deducting the redundancy count from the specified total number of bricks.
Redundancy determines how many bricks can be lost without interrupting the operation of the
volume. If redundancy count is not specified, based on the configuration it is computed
automatically to the optimal value and a warning message is displayed.
The default value for transport is tcp. Other options can be passed such as auth.allow or
auth.reject. See Section 5.3, “About Encrypted Disk” for a full list of parameters.
IMPORTANT
The open-behind volume option is enabled by default. If you are accessing the
dispersed volume using the SMB protocol, you must disable the open-behind
volume option to avoid performance bottleneck on large file workload. Run the
following command to disable open-behind volume option:
3. Run gluster volume info command to optionally display the volume information.
98
CHAPTER 5. SETTING UP STORAGE VOLUMES
IMPORTANT
Use gluster volume create to create different types of volumes, and gluster volume info to
verify successful volume creation.
Prerequisites
A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the
Trusted Storage Pool”.
Understand how to start and stop volumes, as described in Section 5.11, “Starting Volumes”.
99
Administration Guide
IMPORTANT
Red Hat recommends you to review the Distributed Dispersed Volume configuration
recommendations explained in Section 11.14, “Recommended Configurations -
Dispersed Volume” before creating the Distributed Dispersed volume.
1. Run the gluster volume create command to create the dispersed volume.
The default value for transport is tcp. Other options can be passed such as auth.allow or
auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.
100
CHAPTER 5. SETTING UP STORAGE VOLUMES
server5:/rhgs11/brick11 server6:/rhgs12/brick12
Creation of test-volume has been successful
Please start the volume to access data.
The above example is illustrated in Figure 5.7, “Illustration of a Dispersed Volume” . In the
illustration and example, you are creating 12 bricks from 6 servers.
IMPORTANT
The open-behind volume option is enabled by default. If you are accessing the
distributed dispersed volume using the SMB protocol, you must disable the
open-behind volume option to avoid performance bottleneck on large file
workload. Run the following command to disable open-behind volume option:
3. Run gluster volume info command to optionally display the volume information.
NOTE
Every volume that is created is exported by default through the SMB protocol. If you
want to disable it, please refer Section 6.3.5, “Disabling SMB Shares” before starting the
volume.
101
Administration Guide
SMB Yes No No No No
NFS-Ganesha No No Yes No No
[a] For more information, refer Section 6.5, “Managing Object Store”.
SMB Yes No
102
CHAPTER 6. CREATING ACCESS TO VOLUMES
IMPORTANT
Red Hat Gluster Storage requires certain ports to be open. You must ensure that the
firewall settings allow access to the ports listed at Chapter 3, Considerations for Red Hat
Gluster Storage.
This section introduces Native Client and explains how to install the software on client machines. This
section also describes how to mount Red Hat Gluster Storage volumes on clients (both manually and
automatically) and how to verify that the Red Hat Gluster Storage volume has mounted successfully.
Red Hat Enterprise Linux Red Hat Gluster Storage version Native client version
version
103
Administration Guide
WARNING
If you want to access a volume being provided by a server using Red Hat Gluster
Storage 3.1.3 or higher, your client must also be using Red Hat Gluster Storage
3.1.3 or higher. Accessing these volumes from earlier client versions can result in
data becoming unavailable and problems with directory operations. This
requirement exists because Red Hat Gluster Storage 3.1.3 changed how the
Distributed Hash Table works in order to improve directory consistency and
remove the effects seen in BZ#1115367 and BZ#1118762.
NOTE
If an existing Red Hat Gluster Storage 2.1 cluster is upgraded to Red Hat Gluster Storage
3.x, older 2.1 based clients can mount the new 3.x volumes, however, clients must be
upgraded to Red Hat Gluster Storage 3.x to run rebalance operation. For more
information, see Section 6.1.3, “Mounting Red Hat Gluster Storage Volumes”
IMPORTANT
All clients must be of the same version. Red Hat strongly recommends upgrading the
servers before upgrading the clients.
Use the Command Line to Register and Subscribe a System to Red Hat Network
Register the system using the command line, and subscribe to the correct channels.
Prerequisites
Know the user name and password of the Red Hat Network (RHN) account with Red Hat
Gluster Storage entitlements.
# rhn_register
2. In the Operating System Release Version screen, select All available updates
and follow the prompts to register the system to the standard base channel of the respective
Red Hat Enterprise Linux Server version.
3. Run the rhn-channel --add --channel command to subscribe the system to the correct
Red Hat Gluster Storage Native Client channel:
For Red Hat Enterprise Linux 7.x clients using Red Hat Satellite Server:
104
CHAPTER 6. CREATING ACCESS TO VOLUMES
NOTE
The following command can also be used, but Red Hat Gluster Storage may
deprecate support for this channel in future releases.
# yum repolist
Use the Command Line to Register and Subscribe a System to Red Hat Subscription Management
Register the system using the command line, and subscribe to the correct repositories.
Prerequisites
Know the user name and password of the Red Hat Subscription Manager account with Red Hat
Gluster Storage entitlements.
1. Run the subscription-manager register command and enter your Red Hat Subscription
Manager user name and password to register the system with Red Hat Subscription Manager.
2. Depending on your client, run one of the following commands to subscribe to the correct
repositories.
105
Administration Guide
NOTE
The following command can also be used, but Red Hat Gluster Storage may
deprecate support for this repository in future releases.
For more information, see Section 3.2 Registering from the Command Line in Using and
Configuring Red Hat Subscription Management.
# yum repolist
Register the system using the web interface, and subscribe to the correct channels.
Prerequisites
Know the user name and password of the Red Hat Network (RHN) account with Red Hat
Gluster Storage entitlements.
2. Move the mouse cursor over the Subscriptions link at the top of the screen, and then click
the Registered Systems link.
3. Click the name of the system to which the Red Hat Gluster Storage Native Client
channel must be appended.
5. Expand the node for Additional Services Channels for Red Hat Enterprise Linux 7 for
x86_64 or Red Hat Enterprise Linux 6 for x86_64 or for Red Hat Enterprise
Linux 5 for x86_64 depending on the client platform.
When the page refreshes, select the Details tab to verify the system is subscribed to the
appropriate channels.
106
CHAPTER 6. CREATING ACCESS TO VOLUMES
Prerequisites
Use the Command Line to Register and Subscribe a System to Red Hat Network or
Use the Command Line to Register and Subscribe a System to Red Hat Subscription
Management or
1. Run the yum install command to install the native client RPM packages.
2. For Red Hat Enterprise 5.x client systems, run the modprobe command to load FUSE modules
before mounting Red Hat Gluster Storage volumes.
# modprobe fuse
WARNING
If you want to access a volume being provided by a server using Red Hat Gluster
Storage 3.1.3 or higher, your client must also be using Red Hat Gluster Storage
3.1.3 or higher. Accessing these volumes from earlier client versions can result in
data becoming unavailable and problems with directory operations. This
requirement exists because Red Hat Gluster Storage 3.1.3 changed how the
Distributed Hash Table works in order to improve directory consistency and
remove the effects seen in BZ#1115367 and BZ#1118762.
# umount /mnt/glusterfs
107
Administration Guide
After mounting a volume, test the mounted volume using the procedure described in Section 6.1.3.4,
“Testing Mounted Volumes”.
NOTE
Clients should be on the same version as the server, and at least on the version
immediately previous to the server version. For Red Hat Gluster Storage 3.3, the
recommended native client version should either be 3.3.z, or 3.2.z. For other
versions, see Section 6.1, “Native Client”.
Server names selected during volume creation should be resolvable in the client
machine. Use appropriate /etc/hosts entries, or a DNS server to resolve
server names to IP addresses.
IMPORTANT
Mounting a sub directory using Native Client is under technology preview. Technology
Preview features are not fully supported under Red Hat service-level agreements
(SLAs), may not be functionally complete, and are not intended for production use.
Tech Preview features provide early access to upcoming product innovations, enabling
customers to test functionality and provide feedback during the development process.
As Red Hat considers making future iterations of Technology Preview features generally
available, we will provide commercially reasonable efforts to resolve any reported issues
that customers experience when using these features.
The following options are available when using the mount -t glusterfs command. All options must
be separated with commas.
backup-volfile-servers=<volfile_server2>:<volfile_server3>:...:<volfile_serverN>
108
CHAPTER 6. CREATING ACCESS TO VOLUMES
List of the backup volfile servers to mount the client. If this option is specified while mounting the
fuse client, when the first volfile server fails, the servers specified in backup-volfile-servers
option are used as volfile servers to mount the client until the mount is successful.
NOTE
log-level
Logs only specified level or higher severity messages in the log-file.
log-file
Logs the messages in the specified file.
transport-type
Specifies the transport type that FUSE client must use to communicate with bricks. If the volume
was created with only one transport type, then that becomes the default when no value is specified.
In case of tcp,rdma volume, tcp is the default.
ro
Mounts the file system as read only.
acl
Enables POSIX Access Control List on mount. See Section 6.4.4, “Checking ACL enablement on a
mounted volume” for further information.
background-qlen=n
Enables FUSE to handle n number of requests to be queued before subsequent requests are denied.
Default value of n is 64.
enable-ino32
this option enables file system to present 32-bit inodes instead of 64- bit inodes.
109
Administration Guide
NOTE
The server specified in the mount command is used to fetch the glusterFS configuration
volfile, which describes the volume name. The client then communicates directly with
the servers mentioned in the volfile (which may not actually include the server used for
mount).
1. If a mount point has not yet been created for the volume, run the mkdir command to create a
mount point.
# mkdir /mnt/glusterfs
2. Run the mount -t glusterfs command, using the key in the task summary as a guide.
The server specified in the mount command is used to fetch the glusterFS configuration volfile, which
describes the volume name. The client then communicates directly with the servers mentioned in the
volfile (which may not actually include the server used for mount).
Using the example server names, the entry contains the following replaced values.
110
CHAPTER 6. CREATING ACCESS TO VOLUMES
OR
If you want to specify the transport type then check the following example:
OR
Using the command-line, verify the Red Hat Gluster Storage volumes have been successfully mounted.
All three commands can be run in the order listed, or used independently to verify a volume has been
successfully mounted.
Prerequisites
1. Run the mount command to check whether the volume was successfully mounted.
# mount
server1:/test-volume on /mnt/glusterfs type
fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
OR
# mount
server1:/test-volume/sub-dir on /mnt/glusterfs type
fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
If transport option is used while mounting a volume, mount status will have the transport type
appended to the volume name. For example, for transport=tcp:
# mount
server1:/test-volume.tcp on /mnt/glusterfs type
fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
OR
111
Administration Guide
# mount
server1:/test-volume/sub-dir.tcp on /mnt/glusterfs type
fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
2. Run the df command to display the aggregated storage space from all the bricks in a volume.
# df -h /mnt/glusterfs
Filesystem Size Used Avail Use% Mounted on
server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
3. Move to the mount directory using the cd command, and list the contents.
# cd /mnt/glusterfs
# ls
6.2. NFS
Red Hat Gluster Storage has two NFS server implementations, Gluster NFS and NFS-Ganesha. Gluster
NFS supports only NFSv3 protocol, however, NFS-Ganesha supports NFSv3, NFSv4.x, and pNFS
protocols.
112
CHAPTER 6. CREATING ACCESS TO VOLUMES
NOTE
Red Hat does not recommend running NFS-Ganesha with any other NFS servers,
such as, kernel-NFS and Gluster NFS servers.
113
Administration Guide
NOTE
From the Red Hat Gluster Storage 3.2 release onwards, Gluster NFS server will be
disabled by default for any new volumes that are created. However, existing volumes
(using Gluster NFS server) will not be impacted even after upgrade to 3.3 and will have
implicit enablement of Gluster NFS server.
Differences in implementation of the NFSv3 standard in operating systems may result in some
operational issues. If issues are encountered when using NFSv3, contact Red Hat support to receive
more information on Red Hat Gluster Storage client operating system compatibility, and information
about known issues affecting NFSv3.
NFS ACL v3 is supported, which allows getfacl and setfacl operations on NFS clients. The following
options are provided to configure the Access Control Lists (ACL) in the glusterFS NFS server with the
nfs.acl option. For example:
NOTE
ACL is ON by default.
Red Hat Gluster Storage includes Network Lock Manager (NLM) v4. NLM protocol allows NFSv3
clients to lock files across the network. NLM is required to make applications running on top of NFSv3
mount points to use the standard fcntl() (POSIX) and flock() (BSD) lock system calls to synchronize
access across clients.
This section describes how to use NFS to mount Red Hat Gluster Storage volumes (both manually and
automatically) and how to verify that the volume has been mounted successfully.
IMPORTANT
On Red Hat Enterprise Linux 7, enable the firewall service in the active zones for
runtime and permanent mode using the following commands:
# firewall-cmd --get-active-zones
To allow the firewall service in the active zones, run the following commands:
114
CHAPTER 6. CREATING ACCESS TO VOLUMES
Section 6.2.2.2, “Using Gluster NFS to Mount Red Hat Gluster Storage Volumes”
In a replicated volume environment, the CTDB software (Cluster Trivial Database) has to be configured
to provide high availability and lock synchronization for Samba shares. CTDB provides high availability
by adding virtual IP addresses (VIPs) and a heartbeat service.
When a node in the trusted storage pool fails, CTDB enables a different node to take over the virtual IP
addresses that the failed node was hosting. This ensures the IP addresses for the services provided are
always available.
IMPORTANT
On Red Hat Enterprise Linux 7, enable the CTDB firewall service in the active zones for
runtime and permanent mode using the below commands:
# firewall-cmd --get-active-zones
NOTE
Amazon Elastic Compute Cloud (EC2) does not support VIPs and is hence not
compatible with this solution.
6.2.2.1.1. Prerequisites
Follow these steps before configuring CTDB on a Red Hat Gluster Storage Server:
If you already have an older version of CTDB (version <= ctdb1.x), then remove CTDB by
executing the following command:
115
Administration Guide
After removing the older version, proceed with installing the latest CTDB.
NOTE
Ensure that the system is subscribed to the samba channel to get the latest
CTDB packages.
Install CTDB on all the nodes that are used as NFS servers to the latest version using the
following command:
In a CTDB based high availability environment of Samba/NFS , the locks will not be migrated on
failover.
You must ensure to open TCP port 4379 between the Red Hat Gluster Storage servers: This is
the internode communication port of CTDB.
To configure CTDB on Red Hat Gluster Storage server, execute the following steps:
1. Create a replicate volume. This volume will host only a zero byte lock file, hence choose
minimal sized bricks. To create a replicate volume run the following command:
where,
N: The number of nodes that are used as Gluster NFS servers. Each node must host one brick.
For example:
2. In the following files, replace "all" in the statement META="all" to the newly created volume
name
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
/var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
For example:
META="all"
to
META="ctdb"
116
CHAPTER 6. CREATING ACCESS TO VOLUMES
The S29CTDBsetup.sh script runs on all Red Hat Gluster Storage servers, adds an entry in
/etc/fstab/ for the mount, and mounts the volume at /gluster/lock on all the nodes with
Gluster NFS server. It also enables automatic start of CTDB service on reboot.
NOTE
When you stop the special CTDB volume, the S29CTDB-teardown.sh script runs
on all Red Hat Gluster Storage servers and removes an entry in /etc/fstab/ for
the mount and unmounts the volume at /gluster/lock.
4. Verify if the file /etc/sysconfig/ctdb exists on all the nodes that is used as Gluster NFS server.
This file contains Red Hat Gluster Storage recommended CTDB configurations.
5. Create /etc/ctdb/nodes file on all the nodes that is used as Gluster NFS servers and add the
IPs of these nodes to the file.
10.16.157.0
10.16.157.3
10.16.157.6
The IPs listed here are the private IPs of NFS servers.
6. On all the nodes that are used as Gluster NFS server which require IP failover, create
/etc/ctdb/public_addresses file and add the virtual IPs that CTDB should create to this file.
Add these IP address in the following format:
For example:
192.168.1.20/24 eth0
192.168.1.21/24 eth0
7. Start the CTDB service on all the nodes by executing the following command:
6.2.2.2. Using Gluster NFS to Mount Red Hat Gluster Storage Volumes
You can use either of the following methods to mount Red Hat Gluster Storage volumes:
117
Administration Guide
NOTE
Currently GlusterFS NFS server only supports version 3 of NFS protocol. As a preferred
option, always configure version 3 as the default version in the nfsmount.conf file at
/etc/nfsmount.conf by adding the following text in the file:
Defaultvers=3
In case the file is not modified, then ensure to add vers=3 manually in all the mount
commands.
RDMA support in GlusterFS that is mentioned in the previous sections is with respect to
communication between bricks and Fuse mount/GFAPI/NFS server. NFS kernel client will still
communicate with GlusterFS NFS server over tcp.
In case of volumes which were created with only one type of transport, communication between
GlusterFS NFS server and bricks will be over that transport type. In case of tcp,rdma volume it could
be changed using the volume set option nfs.transport-type.
After mounting a volume, you can test the mounted volume using the procedure described in
.Section 6.2.2.2.4, “Testing Volumes Mounted Using Gluster NFS”
Create a mount point and run the mount command to manually mount a Red Hat Gluster Storage
volume using Gluster NFS.
1. If a mount point has not yet been created for the volume, run the mkdir command to create a
mount point.
# mkdir /mnt/glusterfs
For Linux
For Solaris
Manually Mount a Red Hat Gluster Storage Volume using Gluster NFS over TCP
Create a mount point and run the mount command to manually mount a Red Hat Gluster Storage
volume using Gluster NFS over TCP.
118
CHAPTER 6. CREATING ACCESS TO VOLUMES
NOTE
glusterFS NFS server does not support UDP. If a NFS client such as Solaris client,
connects by default using UDP, the following message appears:
Currently, MOUNT over UDP does not have support for mounting subdirectories
on a volume. Mounting server:/volume/subdir exports is only functional
when MOUNT over TCP is used.
MOUNT over UDP does not currently have support for different authentication
options that MOUNT over TCP honors. Enabling nfs.mount-udp may give
more permissions to NFS clients than intended via various authentication
options like nfs.rpc-auth-allow, nfs.rpc-auth-reject and
nfs.export-dir.
1. If a mount point has not yet been created for the volume, run the mkdir command to create a
mount point.
# mkdir /mnt/glusterfs
2. Run the correct mount command for the system, specifying the TCP protocol option for the
system.
For Linux
For Solaris
Red Hat Gluster Storage volumes can be mounted automatically using Gluster NFS, each time the
system starts.
119
Administration Guide
NOTE
In addition to the tasks described below, Red Hat Gluster Storage supports Linux, UNIX,
and similar operating system's standard method of auto-mounting Gluster NFS mounts.
Update the /etc/auto.master and /etc/auto.misc files, and restart the autofs
service. Whenever a user or process attempts to access the directory it will be mounted
in the background on-demand.
Mount a Red Hat Gluster Storage Volume automatically using NFS at server start.
Using the example server names, the entry contains the following replaced values.
Mount a Red Hat Gluster Storage Volume automatically using NFS over TCP at server start.
Using the example server names, the entry contains the following replaced values.
The nfs.export-dir and nfs.export-dirs options provide granular control to restrict or allow
specific clients to mount a sub-directory. These clients can be authenticated during sub-directory
mount with either an IP, host name or a Classless Inter-Domain Routing (CIDR) range.
nfs.export-dirs
This option is enabled by default. It allows the sub-directories of exported volumes to be mounted
by clients without needing to export individual sub-directories. When enabled, all sub-directories of
all volumes are exported. When disabled, sub-directories must be exported individually in order to
mount them on clients.
To disable this option for all volumes, run the following command:
120
CHAPTER 6. CREATING ACCESS TO VOLUMES
nfs.export-dir
When nfs.export-dirs is set to on, the nfs.export-dir option allows you to specify one or
more sub-directories to export, rather than exporting all subdirectories (nfs.export-dirs on),
or only exporting individually exported subdirectories (nfs.export-dirs off).
The subdirectory path should be the path from the root of the volume. For example, in a volume
with six subdirectories, to export the first three subdirectories, the command would be the
following:
Subdirectories can also be exported based on the IP address, hostname, or a Classless Inter-
Domain Routing (CIDR) range by adding these details in parentheses after the directory path:
You can confirm that Red Hat Gluster Storage directories are mounting successfully.
Using the command-line, verify the Red Hat Gluster Storage volumes have been successfully mounted.
All three commands can be run in the order listed, or used independently to verify a volume has been
successfully mounted.
Prerequisites
1. Run the mount command to check whether the volume was successfully mounted.
# mount
server1:/test-volume on /mnt/glusterfs type nfs (rw,addr=server1)
2. Run the df command to display the aggregated storage space from all the bricks in a volume.
121
Administration Guide
# df -h /mnt/glusterfs
Filesystem Size Used Avail Use% Mounted on
server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
3. Move to the mount directory using the cd command, and list the contents.
# cd /mnt/glusterfs
# ls
Q: The mount command on the NFS client fails with RPC Error: Program not registered.
This error is encountered due to one of the following reasons:
The NFS server is not running. You can check the status using the following
command:
The volume is not started. You can check the status using the following command:
A: If the NFS server is not running, then restart the NFS server using the following
command:
If the volume is not started, then start the volume using the following command:
If both rpcbind and NFS server is running then restart the NFS server using the following
commands:
Q: The rpcbind service is not running on the NFS client. This could be due to the following
reasons:
122
CHAPTER 6. CREATING ACCESS TO VOLUMES
Q: The NFS server glusterfsd starts but the initialization fails with nfsrpc- service: portmap
registration of program failed error message in the log.
A: NFS start-up succeeds but the initialization of the NFS service can still fail preventing clients
from accessing the mount points. Such a situation can be confirmed from the following error
messages in the log file:
[2010-05-26 23:33:47] E
[rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could
notregister with portmap
[2010-05-26 23:33:47] E [rpcsvc.c:2682:rpcsvc_program_register] rpc-
service: portmap registration of program failed
[2010-05-26 23:33:47] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-
service: Program registration failed: MOUNT3, Num: 100005, Ver: 3,
Port: 38465
[2010-05-26 23:33:47] E [nfs.c:125:nfs_init_versions] nfs: Program
init failed
[2010-05-26 23:33:47] C [nfs.c:531:notify] nfs: Failed to initialize
protocols
[2010-05-26 23:33:49] E
[rpcsvc.c:2614:rpcsvc_program_unregister_portmap] rpc-service: Could
not unregister with portmap
[2010-05-26 23:33:49] E [rpcsvc.c:2731:rpcsvc_program_unregister] rpc-
service: portmap unregistration of program failed
[2010-05-26 23:33:49] E [rpcsvc.c:2744:rpcsvc_program_unregister] rpc-
service: Program unregistration failed: MOUNT3, Num: 100005, Ver: 3,
Port: 38465
1. Start the rpcbind service on the NFS server by running the following command:
Such an error is also seen when there is another NFS server running on the same
machine but it is not the glusterFS NFS server. On Linux systems, this could be the kernel
NFS server. Resolution involves stopping the other NFS server or not running the
glusterFS NFS server on the machine. Before stopping the kernel NFS server, ensure that
no critical service depends on access to that NFS server's exports.
On Linux, kernel NFS servers can be stopped by using either of the following commands
depending on the distribution in use:
123
Administration Guide
Q: The NFS server start-up fails with the message Port is already in usein the log file.
A: This error can arise in case there is already a glusterFS NFS server running on the same machine.
This situation can be confirmed from the log file, if the following error lines exist:
In this release, the glusterFS NFS server does not support running multiple NFS servers on the
same machine. To resolve the issue, one of the glusterFS NFS servers must be shutdown.
The NFS server attempts to authenticate NFS clients by performing a reverse DNS
lookup to match host names in the volume file with the client IP addresses. There can be
a situation where the NFS server either is not able to connect to the DNS server or the
DNS server is taking too long to respond to DNS request. These delays can result in
delayed replies from the NFS server to the NFS client resulting in the timeout error.
NFS server provides a work-around that disables DNS requests, instead relying only on
the client IP addresses for authentication. The following option can be added for
successful mounting in such situations:
NOTE
NFS version used by the NFS client is other than version 3 by default.
glusterFS NFS server supports version 3 of NFS protocol by default. In recent Linux
124
CHAPTER 6. CREATING ACCESS TO VOLUMES
kernels, the default NFS version has been changed from 3 to 4. It is possible that the
client machine is unable to connect to the glusterFS NFS server because it is using
version 4 messages which are not understood by glusterFS NFS server. The timeout can
be resolved by forcing the NFS client to use version 3. The vers option to mount
command is used for this purpose:
Q: The showmount command fails with clnt_create: RPC: Unable to receive error. This error is
encountered due to the following reasons:
A: Check the firewall settings, and open ports 111 for portmap requests/replies and glusterFS NFS
server requests/replies. glusterFS NFS server operates over the following port numbers: 38465,
38466, and 38467.
Q: The application fails with Invalid argument or Value too large for defined data type
A: These two errors generally happen for 32-bit NFS clients, or applications that do not support 64-
bit inode numbers or large files.
Use the following option from the command-line interface to make glusterFS NFS return 32-bit
inode numbers instead:
This option is off by default, which permits NFS to return 64-bit inode numbers by default.
Applications that will benefit from this option include those that are:
built and run on 32-bit machines, which do not support large files by default,
Applications which can be rebuilt from source are recommended to be rebuilt using the following
flag with gcc:
-D_FILE_OFFSET_BITS=64
Q: After the machine that is running NFS server is restarted the client fails to reclaim the locks
held earlier.
A: The Network Status Monitor (NSM) service daemon (rpc.statd) is started before gluster NFS
server. Hence, NSM sends a notification to the client to reclaim the locks. When the clients send
the reclaim request, the NFS server does not respond as it is not started yet. Hence the client
request fails.
Solution: To resolve the issue, prevent the NSM daemon from starting when the server starts.
125
Administration Guide
If any of the entries are on,run chkconfig nfslock off to disable NSM clients during boot,
which resolves the issue.
Q: The rpc actor failed to complete successfully error is displayed in the nfs.log,
even after the volume is mounted successfully.
A: gluster NFS supports only NFS version 3. When nfs-utils mounts a client when the version is not
mentioned, it tries to negotiate using version 4 before falling back to version 3. This is the cause
of the messages in both the server log and the nfs.log file.
To resolve the issue, declare NFS version 3 and the noacl option in the mount command as
follows:
Red Hat Gluster Storage 3.3 is supported with the community’s V2.4.4 stable release of NFS-Ganesha
on Red Hat Enterprise Linux 7. To understand the various supported features of NFS-ganesha see,
Supported Features of NFS-Ganesha.
NOTE
To install NFS-Ganesha refer, Deploying NFS-Ganesha on Red Hat Gluster Storagein the
Red Hat Gluster Storage 3.3 Installation Guide.
Red Hat Gluster Storage does not support NFSv4 delegations. For more information
refer, Support matrix.
126
CHAPTER 6. CREATING ACCESS TO VOLUMES
127
Administration Guide
Data coherency across the multi-head NFS-Ganesha servers in the cluster is achieved using the
Gluster’s Upcall infrastructure. Gluster’s Upcall infrastructure is a generic and extensible framework
that sends notifications to the respective glusterfs clients (in this case NFS-Ganesha server) when
changes are detected in the back-end file system.
pNFS (Tech-Preview)
The Parallel Network File System (pNFS) is part of the NFS v4.1 protocol that allows compute clients to
access storage devices directly and in parallel.
NOTE
To set up NFS Ganesha, follow the steps mentioned in the further sections.
NOTE
You can also set up NFS-Ganesha using gdeploy, that automates the steps mentioned
below. For more information, see "Deploying NFS-Ganesha"
128
CHAPTER 6. CREATING ACCESS TO VOLUMES
The following table lists the port details for NFS-Ganesha cluster setup:
sshd 22 TCP
NOTE
The port details for the Red Hat Gluster Storage services are listed under section 3.
Verifying Port Access.
129
Administration Guide
NOTE
For the NFS client to use the LOCK functionality, the ports used by LOCKD and STATD
daemons has to be configured and opened via firewalld on the client machine:
3. Open the ports that are configured in the first step using the following
commnad:
4. To ensure NFS client UDP mount does not fail, ensure to open port 2049 by
executing the following command:
Firewall Settings
On Red Hat Enterprise Linux 7, enable the firewall services mentioned below.
# firewall-cmd --get-active-zones
2. Allow the firewall service in the active zones, run the following commands:
130
CHAPTER 6. CREATING ACCESS TO VOLUMES
Ensure that the following prerequisites are taken into consideration before you run NFS-Ganesha in
your environment:
A Red Hat Gluster Storage volume must be available for export and NFS-Ganesha rpms are
installed.
NOTE
Reserve virtual IPs on the network for each of the servers configured in the ganesha.conf file.
Ensure that these IPs are different than the hosts' static IPs and are not used anywhere else in
the trusted storage pool or in the subnet.
Ensure that all the nodes in the cluster are DNS resolvable. For example, you can populate the
/etc/hosts with the details of all the nodes in the cluster.
On Red Hat Enterprise Linux 7, execute the following commands to disable and stop
NetworkManager service and to enable the network service.
131
Administration Guide
Create and mount a gluster shared volume by executing the following command:
For more information, see Section 11.10, “Setting up Shared Storage Volume”
The HA cluster is maintained using Pacemaker and Corosync. Pacemaker acts a resource manager and
Corosync provides the communication layer of the cluster. For more information about
Pacemaker/Corosync see the documentation under the Clustering section of the Red Hat Enterprise
Linux 7 documentation: https://access.redhat.com/documentation/en-
US/Red_Hat_Enterprise_Linux/7/
NOTE
132
CHAPTER 6. CREATING ACCESS TO VOLUMES
NOTE
To start pcsd by default after the system is rebooted, execute the following
command:
3. Set a password for the user ‘hacluster’ on all the nodes using the following command. Use the
same password for all the nodes:
4. Perform cluster authentication between the nodes, where, username is ‘hacluster’, and
password is the one you used in the previous step. Ensure to execute the following command
on every node:
NOTE
The hostname of all the nodes in the Ganesha-HA cluster must be included in
the command when executing it on every node.
For example, in a four node cluster; nfs1, nfs2, nfs3, and nfs4, execute the following command
on every node:
5. Passwordless ssh for the root user has to be enabled on all the HA nodes. Follow these steps,
2. Deploy the generated public key from node1 to all the nodes (including node1) by executing
the following command for every node:
133
Administration Guide
3. Copy the ssh keypair from node1 to all the nodes in the Ganesha-HA cluster by executing
the following command for every node:
# scp -i /var/lib/glusterd/nfs/secret.pem
/var/lib/glusterd/nfs/secret.* root@<node-
ip/hostname>:/var/lib/glusterd/nfs/
6. As part of cluster setup, port 875 is used to bind to the Rquota service. If this port is already in
use, assign a different port to this service by modifying following line in
‘/etc/ganesha/ganesha.conf’ file on all the nodes.
The ganesha-ha.conf.sample is created in the following location /etc/ganesha when Red Hat Gluster
Storage is installed. Rename the file to ganesha-ha.conf and make the changes based on your
environment.
NOTE
134
CHAPTER 6. CREATING ACCESS TO VOLUMES
1. If you have upgraded to Red Hat Enterprise Linux 7.4, then enable the gluster_use_execmem
boolean by executing the following command:
# setsebool -P gluster_use_execmem on
NOTE
Before enabling or disabling NFS-Ganesha, ensure that all the nodes that are
part of the NFS-Ganesha cluster are up.
For example,
NOTE
After enabling NFS-Ganesha, if rpcinfo -p shows the statd port different from
662, then, restart the statd service:
For example,
135
Administration Guide
# /usr/libexec/ganesha/ganesha-ha.sh --status
/var/run/gluster/shared_storage/nfs-ganesha
For example:
# /usr/libexec/ganesha/ganesha-ha.sh --status
/var/run/gluster/shared_storage/nfs-ganesha
NOTE
Disabling NFS Ganesha does not enable Gluster NFS by default. If required,
Gluster NFS must be enabled manually.
For example:
This command unexports the Red Hat Gluster Storage volume without affecting other exports.
For example:
136
CHAPTER 6. CREATING ACCESS TO VOLUMES
To verify the status of the volume set options, follow the guidelines mentioned below:
For example:
# showmount -e localhost
For example:
# showmount -e localhost
Export list for localhost:
/volname (everyone)
The logs of ganesha.nfsd daemon are written to /var/log/ganesha.log. Check the log file on
noticing any unexpected behavior.
NFS-Ganesha exports can be accessed by mounting them in either NFSv3 or NFSv4 mode. Since this is
an active-active HA configuration, the mount operation can be performed from the VIP of any node.
For better large file performance on all workloads that is generated on Red Hat Enterprise Linux 7
clients, it is recommended to set the following tunable before mounting the volume:
# sysctl -w sunrpc.tcp_slot_table_entries=128
# echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
# echo 128 > /proc/sys/sunrpc/tcp_max_slot_table_entries
137
Administration Guide
NOTE
Ensure that NFS clients and NFS-Ganesha servers in the cluster are DNS resolvable with
unique host-names to use file locking through Network Lock Manager (NLM) protocol.
For example:
For example:
To display the IP addresses of clients that have mounted the NFS exports, execute the following
command:
NOTE
If the NFS export is unmounted or if a client is disconnected from the server, it may take
a few minutes for this to be updated in the command output.
To modify the existing HA cluster and to change the default values of the exports use the ganesha-
ha.sh script located at /usr/libexec/ganesha/.
138
CHAPTER 6. CREATING ACCESS TO VOLUMES
Before adding a node to the cluster, ensure that the firewall services are enabled as mentioned in Port
Information for NFS-Ganesha and also the prerequisites mentioned in section Pre-requisites to run NFS-
Ganesha are met.
NOTE
To add a node to the cluster, execute the following command on any of the nodes in the existing NFS-
Ganesha cluster:
where,
For example:
# /usr/libexec/ganesha/ganesha-ha.sh --add
/var/run/gluster/shared_storage/nfs-ganesha server16 10.00.00.01
To delete a node from the cluster, execute the following command on any of the nodes in the existing
NFS-Ganesha cluster:
where,
HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is located at
/run/gluster/shared_storage/nfs-ganesha.
For example:
# /usr/libexec/ganesha/ganesha-ha.sh --delete
/var/run/gluster/shared_storage/nfs-ganesha server16
139
Administration Guide
It is recommended to use gluster CLI options to export or unexport volumes through NFS-Ganesha.
However, this section provides some information on changing configurable parameters in NFS-
Ganesha. Such parameter changes require NFS-Ganesha to be started manually.
To modify the default export configurations perform the following steps on any of the nodes in the
existing ganesha cluster:
where:
HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is located
at /run/gluster/shared_storage/nfs-ganesha.
volname: The name of the volume whose export configuration has to be changed.
The following are the default set of parameters required to export any entry. The values given here are
the default values used by the CLI options to start or stop NFS-Ganesha.
# cat export.conf
EXPORT{
Export_Id = 1 ; # Export ID unique to each export
Path = "volume_path"; # Path of the volume to be exported. Eg:
"/test_volume"
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted
pool
volume = "volume_name"; # Volume name. Eg: "test_volume"
}
140
CHAPTER 6. CREATING ACCESS TO VOLUMES
Exporting Subdirectories
The parameter values and permission values given in the EXPORT block applies to any client that
mounts the exported volume. To provide specific permissions to specific clients , introduce a client
block inside the EXPORT block.
For example, to assign specific permissions for client 10.00.00.01, add the following block in the
EXPORT block.
client {
clients = 10.00.00.01; # IP of the client.
access_type = "RO"; # Read-only permissions
Protocols = "3"; # Allow only NFSv3 protocol.
anonymous_uid = 1440;
anonymous_gid = 72;
}
The following section describes various configurations possible via NFS-Ganesha. Minor changes have
to be made to the export.conf file to see the expected behavior.
All the other clients inherit the permissions that are declared outside the client block.
Disable_ACL = FALSE;
NOTE
NFS clients should remount their share after enabling/disabling ACLs on the NFS-
Ganesha server.
This path has to be used while mounting the export entry in NFSv4 mode.
2. To export subdirectories within a volume, edit the following parameters in the export.conf
141
Administration Guide
file.
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted
pool
volume = "volume_name"; # Volume name. Eg: "test_volume"
volpath = "path_to_subdirectory_with_respect_to_volume";
#Subdirectory path from the root of the volume. Eg: "/test_subdir"
}
3. Change Export_ID to an unused value. I should preferably be a larger value so that it cannot
be re-used for other volumes.
NOTE
If there are multiple sub-directories to be exported, create EXPORT blocks for each
such sub-directory and then restart the nfs-ganesha service.
1. Install the krb5-workstation and the ntpdate packages on all the machines:
NOTE
2. Configure the ntpdate based on the valid time server according to the environment:
3. Ensure that all systems can resolve each other by FQDN in DNS.
4. Configure the /etc/krb5.conf file and add relevant changes accordingly. For example:
[logging]
142
CHAPTER 6. CREATING ACCESS TO VOLUMES
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
default_realm = EXAMPLE.COM
default_ccache_name = KEYRING:persistent:%{uid}
[realms]
EXAMPLE.COM = {
kdc = kerberos.example.com
admin_server = kerberos.example.com
}
[domain_realm]
.example.com = EXAMPLE.COM
example.com = EXAMPLE.COM
NOTE
For further details regarding the file configuration, refer to man krb5.conf.
5. On the NFS-server and client, update the /etc/idmapd.conf file by making the required change.
For example:
Domain = example.com
NOTE
Before setting up the NFS-Ganesha server, make sure to set up the KDC based on the
requirements.
2. Install the relevant gluster and NFS-Ganesha rpms. For more information see, Red Hat Gluster
Storage 3.3 Installation Guide.
143
Administration Guide
$ kadmin
$ kadmin: addprinc -randkey nfs/<host_name>@EXAMPLE.COM
$ kadmin: ktadd nfs/<host_name>@EXAMPLE.COM
For example:
# kadmin
Authenticating as principal root/admin@EXAMPLE.COM with password.
Password for root/admin@EXAMPLE.COM:
NFS_KRB5
{
PrincipalName = nfs ;
KeytabPath = /etc/krb5.keytab ;
Active_krb5 = true ;
}
5. Based on the different kerberos security flavours (krb5, krb5i and krb5p) supported by nfs-
ganesha, configure the 'SecType' parameter in the volume export file
(/var/run/gluster/shared_storage/nfs-ganesha/exports) with appropriate security flavour
6. Create an unprivileged user and ensure that the users that are created are resolvable to the
UIDs through the central user database. For example:
144
CHAPTER 6. CREATING ACCESS TO VOLUMES
# useradd guest
NOTE
The username of this user has to be the same as the one on the NFS-client.
NOTE
For a detailed information on setting up NFS-clients for security on Red Hat Enterprise
Linux, see Section 8.8.2 NFS Security, in the Red Hat Enterprise Linux 7 Storage
Administration Guide.
2. Create a kerberos principle and add it to krb5.keytab on the client side. For example:
# kadmin
# kadmin: addprinc -randkey host/<host_name>@EXAMPLE.COM
# kadmin: ktadd host/<host_name>@EXAMPLE.COM
# kadmin
Authenticating as principal root/admin@EXAMPLE.COM with password.
Password for root/admin@EXAMPLE.COM:
145
Administration Guide
3. Check the status of nfs-client.target service and start it, if not already started:
4. Create an unprivileged user and ensure that the users that are created are resolvable to the
UIDs through the central user database. For example:
# useradd guest
NOTE
The username of this user has to be the same as the one on the NFS-server.
For example:
Creation of a directory on the mount point and all other operations as root should be
successful.
# su - guest
Without a kerberos ticket, all access to /mnt should be denied. For example:
# su guest
# ls
ls: cannot open directory .: Permission denied
7. Get the kerberos ticket for the guest and access /mnt:
# kinit
Password for guest@EXAMPLE.COM:
# ls
<directory created>
146
CHAPTER 6. CREATING ACCESS TO VOLUMES
IMPORTANT
With this ticket, some access must be allowed to /mnt. If there are directories
on the NFS-server where "guest" does not have access to, it should work
correctly.
The following list describes how the time taken for the NFS server to detect a server reboot or resume
is calculated.
If the ganesha.nfsd dies (crashes, oomkill, admin kill), the maximum time to detect it and put
the ganesha cluster into grace is 20sec, plus whatever time pacemaker needs to effect the
fail-over.
NOTE
This time taken to detect if the service is down, can be edited using the
following command on all the nodes:
If the whole node dies (including network failure) then this down time is the total of whatever
time pacemaker needs to detect that the node is gone, the time to put the cluster into grace,
and the time to effect the fail-over. This is ~20 seconds.
So the max-fail-over time is approximately 20-22 seconds, and the average time is typically
less. In other words, the time taken for NFS clients to detect server reboot or resume I/O is 20
- 22 seconds.
After failover, there is a short period of time during which clients try to reclaim their lost OPEN/LOCK
state. Servers block certain file operations during this period, as per the NFS specification. The file
operations blocked are as follows:
Table 6.6.
Protocols FOPs
NFSV3
SETATTR
147
Administration Guide
NLM
LOCK
UNLOCK
SHARE
UNSHARE
CANCEL
LOCKT
NFSV4
LOCK
LOCKT
OPEN
REMOVE
RENAME
SETATTR
NOTE
LOCK, SHARE, and UNSHARE will be blocked only if it is requested with reclaim set to
FALSE.
OPEN will be blocked if requested with claim type other than CLAIM_PREVIOUS or
CLAIM_DELEGATE_PREV.
The default value for the grace period is 90 seconds. This value can be changed by adding the following
lines in the /etc/ganesha/ganesha.conf file.
NFSv4 {
Grace_Period=<grace_period_value_in_sec>;
}
After editing the /etc/ganesha/ganesha.conf file, restart the NFS-Ganesha service using the
following command on all the nodes :
6.2.3.8. pNFS
148
CHAPTER 6. CREATING ACCESS TO VOLUMES
IMPORTANT
pNFS is a technology preview feature. Technology preview features are not fully
supported under Red Hat subscription level agreements (SLAs), may not be functionally
complete, and are not intended for production use. However, these features provide
early access to upcoming product innovations, enabling customers to test functionality
and provide feedback during the development process. As Red Hat considers making
future iterations of technology preview features generally available, we will provide
commercially reasonable support to resolve any reported issues that customers
experience when using these features.
The Parallel Network File System (pNFS) is part of the NFS v4.1 protocol that allows compute clients to
access storage devices directly and in parallel. The pNFS cluster consists of Meta-Data-Server (MDS)
and Data-Server (DS). The client sends all the read/write requests directly to DS and all the other
operations are handled by the MDS.
Current architecture supports only single MDS and mulitple data servers. The server with which client
mounts will act as MDS and all severs including MDS can act as DS
6.2.3.8.1. Prerequisites
Disable kernel-NFS, glusterFS-NFS servers on the system using the following commands:
Disable nfs-ganesha and tear down HA cluster via gluster CLI (only if nfs-ganesha HA cluster is
already created) by executing the following command:
Configure the MDS by adding following block to the ganesha.conf file located at
/etc/ganesha:
GLUSTER
{
PNFS_MDS = true;
}
For optimal working of pNFS, NFS-Ganesha servers should run on every node in the trusted
pool using the following command:
On RHEL 7
149
Administration Guide
Verify if the volume is exported via NFS-Ganesha on all the nodes by executing the following
command:
# showmount -e localhost
Mount the volume using NFS-Ganesha MDS server in the trusted pool using the following command.
It is recommended to use gluster CLI options to export or unexport volumes through NFS-Ganesha.
However, this section provides some information on changing configurable parameters in NFS-
Ganesha. Such parameter changes require NFS-Ganesha to be started manually.
To modify the default export configurations perform the following steps on any of the nodes in the
existing ganesha cluster:
1. Edit/add the required fields in the corresponding export configuration file in the
/run/gluster/shared_storage/nfs-ganesha/exports directory.
where:
HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is located
at /etc/ganesha.
volname: The name of the volume whose export configuration has to be changed.
The following are the default set of parameters required to export any entry. The values given here are
the default values used by the CLI options to start or stop NFS-Ganesha.
# cat export.conf
EXPORT{
Export_Id = 1 ; # Export ID unique to each export
Path = "volume_path"; # Path of the volume to be exported. Eg:
"/test_volume"
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted
pool
volume = "volume_name"; # Volume name. Eg: "test_volume"
150
CHAPTER 6. CREATING ACCESS TO VOLUMES
The following section describes various configurations possible via NFS-Ganesha. Minor changes have
to be made to the export.conf file to see the expected behavior.
Exporting Subdirectories
Exporting Subdirectories
To export subdirectories within a volume, edit the following parameters in the export.conf file.
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted pool
volume = "volume_name"; # Volume name. Eg: "test_volume"
volpath = "path_to_subdirectory_with_respect_to_volume"; #Subdirectory
path from the root of the volume. Eg: "/test_subdir"
}
For example, to assign specific permissions for client 10.00.00.01, add the following block in the
EXPORT block.
client {
clients = 10.00.00.01; # IP of the client.
allow_root_access = true;
access_type = "RO"; # Read-only permissions
Protocols = "3"; # Allow only NFSv3 protocol.
anonymous_uid = 1440;
anonymous_gid = 72;
}
151
Administration Guide
All the other clients inherit the permissions that are declared outside the client block.
Disable_ACL = FALSE;
This path has to be used while mounting the export entry in NFSv4 mode.
6.2.3.10. Troubleshooting
Mandatory checks
Ensure you execute the following commands for all the issues/failures that is encountered:
/var/log/ganesha.log
/var/log/ganesha-gfapi.log
/var/log/messages
/var/log/pcsd.log
Situation
NFS-Ganesha fails to start.
Solution
Ensure you execute all the mandatory checks to understand the root cause before proceeding
with the following steps. Follow the listed steps to fix the issue:
2. Ensure that the port 875 is free to connect to the RQUOTA service.
3. Ensure that the shared storage volume mount exists on the server after node
reboot/shutdown. If it does not, then mount the shared storage volume manually using the
following command:
152
CHAPTER 6. CREATING ACCESS TO VOLUMES
# mount -t glusterfs
<local_node's_hostname>:gluster_shared_storage
/var/run/gluster/shared_storage
Situation
NFS-Ganesha port 875 is unavailable.
Solution
Ensure you execute all the mandatory checks to understand the root cause before proceeding
with the following steps. Follow the listed steps to fix the issue:
1. Run the following command to extract the PID of the process using port 875:
2. Determine if the process using port 875 is an important system or user process.
3. Perform one of the following depending upon the importance of the process:
If the process using port 875 is not an important system or user process:
1. Run the following command to kill the process using port 875:
# kill pid;
2. Run the following command to ensure that the process is killed and port 875 is
free to use:
153
Administration Guide
Situation
NFS-Ganesha Cluster setup fails.
Solution
Ensure you execute all the mandatory checks to understand the root cause before proceeding
with the following steps.
2. Ensure that pcs cluster auth command is executed on all the nodes with same
password for the user hacluster
4. Ensure that the name of the HA Cluster does not exceed 15 characters.
Situation
NFS-Ganesha has started and fails to export a volume.
Solution
Ensure you execute all the mandatory checks to understand the root cause before proceeding
with the following steps. Follow the listed steps to fix the issue:
/var/log/ganesha.log
/var/log/ganesha-gfapi.log
/var/log/messages
5. If the volume is not in a started state, run the following command to start the volume.
If the volume is not exported as part of volume start, run the following command to re-
export the volume:
154
CHAPTER 6. CREATING ACCESS TO VOLUMES
# /usr/libexec/ganesha/dbus-send.sh
/var/run/gluster/shared_storage on <volname>
Situation
Adding a new node to the HA cluster fails.
Solution
Ensure you execute all the mandatory checks to understand the root cause before proceeding
with the following steps. Follow the listed steps to fix the issue:
1. Ensure to run the following command from one of the nodes that is already part of the
cluster:
3. Make sure that all the nodes of the cluster is DNS resolvable from the node that needs to
be added.
4. Execute the following command for each of the hosts in the HA cluster on the node that
needs to be added:
Situation
Cleanup required when nfs-ganesha HA cluster setup fails.
Solution
To restore back the machines to the original state, execute the following commands on each
node forming the cluster:
# /use/libexec/ganesha.sh --teardown
/var/run/gluster/shared_storage/nfs-ganesha
# /use/libexec/ganesha.sh --cleanup
/var/run/gluster/shared_storage/nfs-ganesha
# systemctl stop nfs-ganesha
Situation
Permission issues.
Solution
By default, the root squash option is disabled when you start NFS-Ganesha using the CLI. In
case, you encounter any permission issues, check the unix permissions of the exported entry.
6.3. SMB
The Server Message Block (SMB) protocol can be used to access Red Hat Gluster Storage volumes by
exporting directories in GlusterFS volumes as SMB shares on the server.
155
Administration Guide
This section describes how to enable SMB shares, how to mount SMB shares on Microsoft Windows-
based clients (both manually and automatically) and how to verify if the share has been mounted
successfully.
NOTE
The Mac OS X command line can be used to access Red Hat Gluster Storage volumes
using SMB.
In Red Hat Gluster Storage, Samba is used to share volumes through SMB protocol.
WARNING
The Samba version 3 is not supported. Ensure that you are using Samba-
4.x. For more information regarding the installation and upgrade steps
refer the Red Hat Gluster Storage 3.3 Installation Guide.
CTDB version 4.x is required for Red Hat Gluster Storage 3.2 and higher.
This is provided in the Red Hat Gluster Storage Samba channel. For more
information regarding the installation and upgrade steps refer the Red Hat
Gluster Storage 3.3 Installation Guide.
IMPORTANT
On Red Hat Enterprise Linux 7, enable the Samba firewall service in the active zones for
runtime and permanent mode using the following commands:
# firewall-cmd --get-active-zones
To allow the firewall services in the active zones, run the following commands
When a node in the trusted storage pool fails, CTDB enables a different node to take over the virtual IP
addresses that the failed node was hosting. This ensures the IP addresses for the services provided are
always available.
156
CHAPTER 6. CREATING ACCESS TO VOLUMES
IMPORTANT
On Red Hat Enterprise Linux 7, enable the CTDB firewall service in the active zones for
runtime and permanent mode using the below commands:
# firewall-cmd --get-active-zones
NOTE
Amazon Elastic Compute Cloud (EC2) does not support VIPs and is hence not
compatible with this solution.
Prerequisites
Follow these steps before configuring CTDB on a Red Hat Gluster Storage Server:
If you already have an older version of CTDB (version <= ctdb1.x), then remove CTDB by
executing the following command:
After removing the older version, proceed with installing the latest CTDB.
NOTE
Ensure that the system is subscribed to the samba channel to get the latest
CTDB packages.
Install CTDB on all the nodes that are used as Samba servers to the latest version using the
following command:
In a CTDB based high availability environment of Samba , the locks will not be migrated on
failover.
You must ensure to open TCP port 4379 between the Red Hat Gluster Storage servers: This is
the internode communication port of CTDB.
1. Create a replicate volume. This volume will host only a zero byte lock file, hence choose
minimal sized bricks. To create a replicate volume run the following command:
157
Administration Guide
where,
N: The number of nodes that are used as Samba servers. Each node must host one brick.
For example:
2. In the following files, replace "all" in the statement META="all" to the newly created volume
name
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
/var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
For example:
META="all"
to
META="ctdb"
3. In the /etc/samba/smb.conf file add the following line in the global section on all the nodes:
clustering=yes
The S29CTDBsetup.sh script runs on all Red Hat Gluster Storage servers, adds an entry in
/etc/fstab/ for the mount, and mounts the volume at /gluster/lock on all the nodes
with Samba server. It also enables automatic start of CTDB service on reboot.
NOTE
When you stop the special CTDB volume, the S29CTDB-teardown.sh script runs
on all Red Hat Gluster Storage servers and removes an entry in /etc/fstab/
for the mount and unmounts the volume at /gluster/lock.
5. Verify if the file /etc/sysconfig/ctdb exists on all the nodes that is used as Samba server.
This file contains Red Hat Gluster Storage recommended CTDB configurations.
6. Create /etc/ctdb/nodes file on all the nodes that is used as Samba servers and add the IPs
of these nodes to the file.
10.16.157.0
10.16.157.3
10.16.157.6
The IPs listed here are the private IPs of Samba servers.
158
CHAPTER 6. CREATING ACCESS TO VOLUMES
7. On all the nodes that are used as Samba server which require IP failover, create
/etc/ctdb/public_addresses file and add the virtual IPs that CTDB should create to this file.
Add these IP address in the following format:
For example:
192.168.1.20/24 eth0
192.168.1.21/24 eth0
8. Start the CTDB service on all the nodes by executing the following command:
1. Run the following command to allow Samba to communicate with brick processes even with
untrusted ports.
NOTE
3. Edit the /etc/glusterfs/glusterd.vol in each Red Hat Gluster Storage node, and add
the following setting:
option rpc-auth-allow-insecure on
NOTE
This allows Samba to communicate with glusterd even with untrusted ports.
5. Run the following command to verify proper lock and I/O coherency.
159
Administration Guide
6. To verify if the volume can be accessed from the SMB/CIFS share, run the following command:
For example:
Server Comment
--------- -------
Workgroup Master
--------- -------
7. To verify if the SMB/CIFS share can be accessed by the user, run the following command:
For example:
When a volume is started using the gluster volume start VOLNAME command, the volume is
automatically exported through Samba on all Red Hat Gluster Storage servers running Samba.
To be able to mount from any server in the trusted storage pool, repeat these steps on each Red Hat
Gluster Storage node. For more advanced configurations, refer to the Samba documentation.
1. Open the /etc/samba/smb.conf file in a text editor and add the following lines for a simple
configuration:
[gluster-VOLNAME]
comment = For samba share of volume VOLNAME
vfs objects = glusterfs
glusterfs:volume = VOLNAME
glusterfs:logfile = /var/log/samba/VOLNAME.log
glusterfs:loglevel = 7
path = /
160
CHAPTER 6. CREATING ACCESS TO VOLUMES
read only = no
guest ok = yes
161
Administration Guide
# smbpasswd -a username
Specify the SMB password. This password is used during the SMB mount.
To allow a non root user to read/write into the mounted volume, ensure you execute the following
steps:
1. Add the user on all the Samba servers based on your configuration:
# adduser username
2. Add the user to the list of Samba users on all Samba servers and assign password by executing
the following command:
# smbpasswd -a username
3. Perform a FUSE mount of the gluster volume on any one of the Samba servers:
For example:
4. Provide required permissions to the user by executing appropriate setfacl command. For
example:
162
CHAPTER 6. CREATING ACCESS TO VOLUMES
For example:
6.3.3.1. Manually Mounting Volumes Using SMB on Red Hat Enterprise Linux and Windows
To mount a Red Hat Gluster Storage volume manually using Server Message Block (SMB) on Red Hat
Enterprise Linux, execute the following steps:
2. Run mount -t cifs to mount the exported SMB share, using the syntax example as
guidance.
The sec=ntlmssp parameter is also required when mounting a volume on Red Hat Enterprise
Linux 6.
For example:
To mount a Red Hat Gluster Storage volume manually using Server Message Block (SMB) on Microsoft
Windows using Windows Explorer, follow these steps:
163
Administration Guide
1. In Windows Explorer, click Tools → Map Network Drive… . to open the Map Network Drive
screen.
3. In the Folder text box, specify the path of the server and the shared resource in the following
format: \\SERVER_NAME\VOLNAME.
4. Click Finish to complete the process, and display the network drive in Windows Explorer.
To mount a Red Hat Gluster Storage volume manually using Server Message Block (SMB) on Microsoft
Windows using Windows Explorer, follow these steps:
2. Enter net use z: \\SERVER_NAME\VOLNAME, where z: is the drive letter to assign to the
shared volume.
6.3.3.2. Automatically Mounting Volumes Using SMB on Red Hat Enterprise Linux and
Windows
You can configure your system to automatically mount Red Hat Gluster Storage volumes using SMB on
Microsoft Windows-based clients each time the system starts.
Mounting a Volume Automatically on Server Start using SMB through Microsoft Windows
Explorer
To mount a Red Hat Gluster Storage Volume automatically using SMB at server start execute the
following steps:
1. Open the /etc/fstab file in a text editor and add a line containing the following details:
In the OPTIONS column, ensure that you specify the credentials option, with a value of the
path to the file that contains the username and/or password.
Using the example server names, the entry contains the following replaced values.
164
CHAPTER 6. CREATING ACCESS TO VOLUMES
The sec=ntlmssp parameter is also required when mounting a volume on Red Hat Enterprise
Linux 6, for example:
See the mount.cifs man page for more information about these options.
Mounting a Volume Automatically on Server Start using SMB through Microsoft Windows Explorer
To mount a Red Hat Gluster Storage volume manually using Server Message Block (SMB) on Microsoft
Windows using Windows Explorer, follow these steps:
1. In Windows Explorer, click Tools → Map Network Drive… . to open the Map Network Drive
screen.
3. In the Folder text box, specify the path of the server and the shared resource in the following
format: \\SERVER_NAME\VOLNAME.
5. Click Finish to complete the process, and display the network drive in Windows Explorer.
6. If the Windows Security screen pops up, enter the username and password and click OK.
Verify the virtual IP (VIP) addresses of a shut down server are carried over to another server in the
replicated volume.
# ctdb status
# ctdb ip
# ctdb ping -n all
2. Mount a Red Hat Gluster Storage volume using any one of the VIPs.
165
Administration Guide
When the Red Hat Gluster Storage server serving the VIP is shut down there will be a pause for
a few seconds, then I/O will resume.
1. On all Red Hat Gluster Storage Servers, with elevated privileges, navigate to
/var/lib/glusterd/hooks/1/start/post
For more information about these scripts, see Section 13.2, “Prepackaged Scripts”.
NOTE
To configure shadow copy, the following configurations must be modified/edited in the smb.conf file.
The smb.conf file is located at etc/samba/smb.conf.
NOTE
For example:
166
CHAPTER 6. CREATING ACCESS TO VOLUMES
167
Administration Guide
[gluster-vol0]
comment = For samba share of volume vol0
vfs objects = shadow_copy2 glusterfs
glusterfs:volume = vol0
glusterfs:logfile = /var/log/samba/glusterfs-vol0.%M.log
glusterfs:loglevel = 3
path = /
read only = no
guest ok = yes
shadow:snapdir = /.snaps
shadow:basedir = /
shadow:sort = desc
shadow:snapprefix= ^S[A-Za-z0-9]*p$
shadow:format = _GMT-%Y.%m.%d-%H.%M.%S
In the above example, the mentioned parameters have to be added in the smb.conf file to enable
shadow copy. The options mentioned are not mandatory.
168
CHAPTER 6. CREATING ACCESS TO VOLUMES
Shadow copy will filter all the snapshots based on the smb.conf entries. It will only show those
snapshots which matches the criteria. In the example mentioned earlier, the snapshot name should
start with an 'S' and end with 'p' and any alpha numeric characters in between is considered for the
search. For example in the list of the following snapshots, the first two snapshots will be shown by
Windows and the last one will be ignored. Hence, these options will help us filter out what snapshots to
show and what not to.
Snap_GMT-2016.06.06-06.06.06
Sl123p_GMT-2016.07.07-07.07.07
xyz_GMT-2016.08.08-08.08.08
After editing the smb.conf file, execute the following steps to enable snapshot access:
2. Enable User Serviceable Snapshot (USS) for Samba. For more information see Section 8.13,
“User Serviceable Snapshots”
1. Right Click on the file or directory for which the previous version is required.
3. In the dialog box, select the Date/Time of the previous version of the file, and select either
Open, Restore, or Copy.
where,
Open: Lets you open the required version of the file in read-only mode.
169
Administration Guide
Enabling Metadata Caching to improve the performance of SMB access of Red Hat Gluster
Storage volumes.
More detailed information for each of this is provided in the sections ahead.
170
CHAPTER 6. CREATING ACCESS TO VOLUMES
Enable metadata caching to improve the performance of directory operations. Execute the following
commands from any one of the nodes on the trusted storage pool in the order mentioned below.
NOTE
If majority of the workload is modifying the same set of files and directories
simultaneously from multiple clients, then enabling metadata caching might not provide
the desired performance improvement.
1. Execute the following command to enable metadata caching and cache invalidation:
This is group set option which sets multiple volume options in a single command.
2. To increase the number of files that can be cached, execute the following command:
n, is set to 50000. It can be increased if the number of active files in the volume is very high.
Increasing this number increases the memory footprint of the brick processes.
The directory listing gets slower as the number of bricks/nodes increases in a volume, though the
file/directory numbers remain unchanged. By enabling the parallel readdir volume option, the
performance of directory listing is made independent of the number of nodes/bricks in the volume.
Thus, the increase in the scale of the volume does not reduce the directory listing performance.
NOTE
You can expect an increase in performance only if the distribute count of the volume is
2 or greater and the size of the directory is small (< 3000 entries). The larger the
volume (distribute count) greater is the performance benefit.
171
Administration Guide
NOTE
Before creating / renaming any file, lookups (5-6 in SMB) are sent to verify if the file already exists. By
serving these lookup from the cache when possible, increases the create / rename performance by
multiple folds in SMB access.
NOTE
The above command also enables cache-invalidation and increases the timeout
to 10 minutes.
This section covers how to view and set access control lists, and how to ensure this feature is enabled
on your Red Hat Gluster Storage volumes. For more detailed information about how ACLs work, see
the Red Hat Enterprise Linux 7 System Administrator's Guide:
https://access.redhat.com/documentation/en-
US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Access_Control_Lists.html.
The syntax of an access rule depends on which roles need to obey the rule.
172
CHAPTER 6. CREATING ACCESS TO VOLUMES
For example, setfacl -m u:fred:rw /mnt/data gives the user fred read and write access to
the /mnt/data directory.
For example, setfacl -m g:admins:rwx /etc/fstab gives users in the admins group read,
write, and execute permissions to the /etc/fstab file.
For example, setfacl -m o:r /mnt/data/public gives users without any specific rules about
their username or group permission to read files in the /mnt/data/public directory.
Rules for setting a maximum access level using an effective rights mask start with m:
You can set the default ACLs for a directory by adding d: to the beginning of any rule, or make a rule
recursive with the -R option. For example, setfacl -Rm d:g:admins:rwx /etc gives all members
of the admins group read, write, and execute access to any file created under the /etc directory after
the point when setfacl is run.
# getfacl file_path
This prints a summary of current ACLs for that file. For example:
# getfacl /mnt/gluster/data/test/sample.jpg
# owner: antony
# group: antony
user::rw-
group::rw-
other::r--
173
Administration Guide
If a directory has default ACLs set, these are prefixed with default:, like so:
# getfacl /mnt/gluster/data/doc
# owner: antony
# group: antony
user::rw-
user:john:r--
group::r--
mask::r--
other::r--
default:user::rwx
default:user:antony:rwx
default:group::r-x
default:mask::rwx
default:other::r-x
ACLs are enabled by default on volumes mounted using the NFS and SMB access protocols. To check
whether ACLs are enabled on other mounted volumes, see Section 6.4.4, “Checking ACL enablement
on a mounted volume”.
Table 6.9.
174
CHAPTER 6. CREATING ACCESS TO VOLUMES
Native FUSE Check the output of the mount command for the See Section 6.1, “Native
default_permissions option: Client” for more
information.
# mount | grep mountpoint
Gluster Native NFS On the server side, check the output of the Refer to the output of
gluster volume info volname command. If gluster volume
nfs.acl appears in the output, that volume has set help pertaining
ACLs disabled. If nfs.acl does not appear, ACLs to NFS, or see the Red
are enabled (the default state). Hat Enterprise Linux
Storage Administration
On the client side, check the output of the mount Guide for more
command for the volume. If noacl appears in the information:
output, ACLs are disabled on the mount point. If this https://access.redhat.c
does not appear in the output, the client checks that om/documentation/en-
the server uses ACLs, and uses ACLs if server US/Red_Hat_Enterprise
support is enabled. _Linux/7/html/Storage
_Administration_Guide/
ch-nfs.html
175
Administration Guide
NFS Ganesha On the server side, check the volume's export See Section 6.2.3, “NFS
configuration file, Ganesha” for more
/run/gluster/shared_storage/nfs- information. For client
ganesha/exports/export.volname.conf. side settings, refer to
If the Disable_ACL option is set totrue, ACLs the Red Hat Enterprise
are disabled. Otherwise, ACLs are enabled for that Linux Storage
volume. Administration Guide:
https://access.redhat.c
om/documentation/en-
NOTE US/Red_Hat_Enterprise
_Linux/7/html/Storage
_Administration_Guide/
NFS-Ganesha supports NFSv4
protocol standardized ACLs but not ch-nfs.html
NFSACL protocol used for NFSv3
mounts. Only NFSv4 mounts can set
ACLs.
samba POSIX ACLs are enabled by default when using See Section 6.3, “SMB”
Samba to access a Red Hat Gluster Storage volume. for more information.
Red Hat Gluster Storage is based on glusterFS, an open source distributed file system. Object Store
technology is built upon OpenStack Swift. OpenStack Swift allows users to store and retrieve files and
content through a simple Web Service REST (Representational State Transfer) interface as objects.
Red Hat Gluster Storage uses glusterFS as a back-end file system for OpenStack Swift. It also
leverages on OpenStack Swift's REST interface for storing and retrieving files over the web combined
with glusterFS features like scalability and high availability, replication, and elastic volume
management for data management at disk level.
Object Store technology enables enterprises to adopt and deploy cloud storage solutions. It allows
users to access and modify data as objects from a REST interface along with the ability to access and
modify files from NAS interfaces. In addition to decreasing cost and making it faster and easier to
access object data, it also delivers massive scalability, high availability and replication of object
storage. Infrastructure as a Service (IaaS) providers can utilize Object Store technology to enable their
own cloud storage service. Enterprises can use this technology to accelerate the process of preparing
file-based applications for the cloud and simplify new application development for cloud computing
environments.
OpenStack Swift is an open source software for creating redundant, scalable object storage using
clusters of standardized servers to store petabytes of accessible data. It is not a file system or real-
time data storage system, but rather a long-term storage system for a more permanent type of static
data that can be retrieved, leveraged, and updated.
176
CHAPTER 6. CREATING ACCESS TO VOLUMES
OpenStack Swift and Red Hat Gluster Storage integration consists of:
For detailed information on Object Storage, see OpenStack Object Storage Administration
Guide available at: http://docs.openstack.org/admin-guide-cloud/content/ch_admin-
openstack-object-storage.html.
Red Hat Gluster Storage environment consists of bricks that are used to build volumes. For
more information on bricks and volumes, see Section 5.4, “Formatting and Mounting Bricks” .
The following diagram illustrates OpenStack Object Storage integration with Red Hat Gluster Storage:
177
Administration Guide
IMPORTANT
On Red Hat Enterprise Linux 7, enable the Object Store firewall service in the active
zones for runtime and permanent mode using the following commands:
# firewall-cmd --get-active-zones
Add the port number 443 only if your swift proxy server is configured with SSL. To add
the port number, run the following commands:
Proxy Server
The Proxy Server is responsible for connecting to the rest of the OpenStack Object Storage
architecture. For each request, it looks up the location of the account, container, or object in the ring
and routes the request accordingly. The public API is also exposed through the proxy server. When
objects are streamed to or from an object server, they are streamed directly through the proxy server
to or from the user – the proxy server does not spool them.
The Ring
The Ring maps swift accounts to the appropriate Red Hat Gluster Storage volume. When other
components need to perform any operation on an object, container, or account, they need to interact
with the Ring to determine the correct Red Hat Gluster Storage volume.
An object is the basic storage entity and any optional metadata that represents the data you store.
When you upload data, the data is stored as-is (with no compression or encryption).
The Object Server is a very simple storage server that can store, retrieve, and delete objects stored on
local devices.
A container is a storage compartment for your data and provides a way for you to organize your data.
Containers can be visualized as directories in a Linux system. However, unlike directories, containers
cannot be nested. Data must be stored in a container and hence the objects are created within a
178
CHAPTER 6. CREATING ACCESS TO VOLUMES
container.
The Container Server’s primary job is to handle listings of objects. The listing is done by querying the
glusterFS mount point with a path. This query returns a list of all files and directories present under
that container.
The OpenStack Swift system is designed to be used by many different storage consumers.
The Account Server is very similar to the Container Server, except that it is responsible for listing
containers rather than objects. In Object Store, each Red Hat Gluster Storage volume is an account.
Object Store provides an option of using an authentication service to authenticate and authorize user
access. Once the authentication service correctly identifies the user, it will provide a token which must
be passed to Object Store for all subsequent container and object operations.
Other than using your own authentication services, the following authentication services are
supported by Object Store:
Each Red Hat Gluster Storage volume is mapped to a single account. Each account can have
multiple users with different privileges based on the group and role they are assigned to. After
authenticating using accountname:username and password, user is issued a token which will be
used for all subsequent REST requests.
When working with Keystone, account names are defined by Keystone as the tenant id. You
must create the Red Hat Gluster Storage volume using the Keystone tenant id as the name
of the volume. This means, you must create the Keystone tenant before creating a Red Hat
Gluster Storage Volume.
IMPORTANT
Red Hat Gluster Storage does not contain any Keystone server components. It
only acts as a Keystone client. After you create a volume for Keystone, ensure
to export this volume for accessing it using the object storage interface. For
more information on exporting volume, see Section 6.5.7.8, “Exporting the Red
Hat Gluster Storage Volumes”.
179
Administration Guide
To protect the metadata, the Red Hat Gluster Storage volume should only be able to be
mounted by the systems running the proxy servers. For more information on mounting
volumes, see Chapter 6, Creating Access to Volumes.
High availability
Scalability
Replication
6.5.4. Limitations
This section lists the limitations of using Red Hat Gluster Storage Object Store:
Object Name
Object Store imposes the following constraints on the object name to maintain the
compatibility with network file access:
Object names must not be prefixed or suffixed by a '/' character. For example, a/b/
Object names must not have contiguous multiple '/' characters. For example, a//b
Account Management
Object Store does not allow account management even though OpenStack Swift allows the
management of accounts. This limitation is because Object Store treats accounts
equivalent to the Red Hat Gluster Storage volumes.
Object Store does not support account names (i.e. Red Hat Gluster Storage volume names)
having an underscore.
In Object Store, every account must map to a Red Hat Gluster Storage volume.
Subdirectory Listing
180
CHAPTER 6. CREATING ACCESS TO VOLUMES
Subject to the limitations mentioned in Section 6.5.4, “Limitations”, the following table describes the
support status for current Swift API’s functional features:
Feature Status
Authentication Supported
181
Administration Guide
Feature Status
6.5.6. Prerequisites
Ensure that you do the following before using Red Hat Gluster Storage Object Store.
Ensure that the openstack-swift-* and swiftonfile packages have matching version numbers.
Ensure that the gluster-swift services are owned by and run as the root user, not the swift
user as in a typical OpenStack installation.
# cd /usr/lib/systemd/system
# sed -i s/User=swift/User=root/ openstack-swift-proxy.service
openstack-swift-account.service openstack-swift-container.service
openstack-swift-object.service openstack-swift-object-
expirer.service
Ensure that the ports for the Object, Container, Account, and Proxy servers are open. Note
that the ports used for these servers are configurable. The ports listed in Table 6.11, “Ports
required for Red Hat Gluster Storage Object Store” are the default values.
Table 6.11. Ports required for Red Hat Gluster Storage Object Store
Server Port
182
CHAPTER 6. CREATING ACCESS TO VOLUMES
Server Port
Create and mount a Red Hat Gluster Storage volume for use as a Swift Account. For
information on creating Red Hat Gluster Storage volumes, see Chapter 5, Setting Up Storage
Volumes . For information on mounting Red Hat Gluster Storage volumes, see Chapter 6,
Creating Access to Volumes .
WARNING
When you install Red Hat Gluster Storage 3.2 and higher, the /etc/swift
directory would contain both *.conf extension and *.conf-gluster files. You
must delete the *.conf files and create new configuration files based on
*.conf-gluster template. Otherwise, inappropriate python packages will be
loaded and the component may not work as expected.
If you are upgrading to Red Hat Gluster Storage 3.2 and higher, the older
configuration files will be retained and new configuration files will be created with
.rpmnew extension. You must ensure to delete .conf files and folders (account-
server, container-server, and object-server) for better understanding of the loaded
configuration.
By default, proxy server only handles HTTP requests. To configure the proxy server to process HTTPS
requests, perform the following steps:
# cd /etc/swift
# openssl req -new -x509 -nodes -out cert.crt -keyout cert.key
183
Administration Guide
bind_port = 443
cert_file = /etc/swift/cert.crt
key_file = /etc/swift/cert.key
IMPORTANT
When Object Storage is deployed on two or more machines, not all nodes in your trusted
storage pool are used. Installing a load balancer enables you to utilize all the nodes in
your trusted storage pool by distributing the proxy server requests equally to all
storage nodes.
Memcached allows nodes' states to be shared across multiple proxy servers. Edit the
memcache_servers configuration option in the proxy-server.conf and list all
memcached servers.
[filter:cache]
use = egg:swift#memcache
memcache_servers =
192.168.1.20:11211,192.168.1.21:11211,192.168.1.22:11211
The port number on which the memcached server is listening is 11211. You must ensure
to use the same sequence for all configuration files.
[pipeline:main]
pipeline = catch_errors healthcheck proxy-logging cache authtoken
keystoneauth proxy-logging proxy-server
[filter:authtoken]
paste.filter_factory =
keystoneclient.middleware.auth_token:filter_factory
signing_dir = /etc/swift
auth_host = keystone.server.com
auth_port = 35357
auth_protocol = http
auth_uri = http://keystone.server.com:5000
# if its defined
admin_tenant_name = services
184
CHAPTER 6. CREATING ACCESS TO VOLUMES
admin_user = swift
admin_password = adminpassword
delay_auth_decision = 1
[filter:keystoneauth]
use = egg:swift#keystoneauth
operator_roles = admin, SwiftOperator
is_admin = true
cache = swift.cache
Integrating GSwauth
Perform the following steps to integrate GSwauth:
1. Create and start a Red Hat Gluster Storage volume to store metadata.
For example:
2. Run gluster-swift-gen-builders tool with all the volumes to be accessed using the
Swift client including gsmetadata volume:
[pipeline:main]
pipeline = catch_errors cache gswauth proxy-server
[filter:gswauth]
use = egg:gluster_swift#gswauth
set log_name = gswauth
super_admin_key = gswauthkey
185
Administration Guide
metadata_volume = gsmetadata
auth_type = sha1
auth_type_salt = swauthsalt
IMPORTANT
Advanced Options:
You can set the following advanced options for GSwauth WSGI filter:
default-swift-cluster: The default storage-URL for the newly created accounts. When you
attempt to authenticate for the first time, the access token and the storage-URL where data
for the given account is stored will be returned.
token_life: The set default token life. The default value is 86400 (24 hours).
max_token_life: The maximum token life. You can set a token lifetime when requesting a new
token with header x-auth-token-lifetime. If the passed in value is greater than the
max_token_life, then the max_token_life value will be used.
-A, --admin-url: The URL to the auth. The default URL is http://127.0.0.1:8080/auth/.
-U, --admin-user: The user with administrator rights to perform action. The default user role is
.super_admin.
-K, --admin-key: The key for the user with administrator rights to perform the action. There is
no default value.
# gswauth-prep [option]
For example:
Creating Accounts
186
CHAPTER 6. CREATING ACCESS TO VOLUMES
Create an account for GSwauth. This account is mapped to a Red Hat Gluster Storage volume.
For example:
Deleting an Account
You must ensure that all users pertaining to this account must be deleted before deleting the account.
To delete an account:
For example:
For example:
User Roles
The following user roles are supported in GSwauth:
A regular user has no rights. Users must be given both read and write privileges using Swift
ACLs.
The admin user is a super-user at the account level. This user can create and delete users for
that account. These members will have both write and read privileges to all stored objects in
that account.
The reseller admin user is a super-user at the cluster level. This user can create and
delete accounts and users and has read and write privileges to all accounts under that cluster.
GSwauth maintains its own swift account to store all of its metadata on accounts and users.
The .super_admin role provides access to GSwauth own swift account and has all privileges
to act on any other account or user.
187
Administration Guide
.super_admin
(username) Get Account List
Create Account
Delete Account
.reseller_admin (group)
Get Account List
Create Account
Delete Account
.admin (group)
Get Account Details
Creating Users
You can create an user for an account that does not exist. The account will be created before creating
the user.
You must add -r flag to create a reseller admin user and -a flag to create an admin user. To
188
CHAPTER 6. CREATING ACCESS TO VOLUMES
change the password or role of the user, you can run the same command with the new option.
For example
Deleting a User
Delete a user by running the following command:
For example
For example:
The second method is a two-step process, first you must authenticate with a username and password
to obtain a token and the storage URL. Then, you can make the object requests to the storage URL
with the given token.
It is important to remember that tokens expires, so the authentication process needs to be repeated
very often.
Now, you use the given token and storage URL to access the object-storage using the Swift client:
189
Administration Guide
IMPORTANT
Reseller admins must always use the second method to acquire a token to get
access to other accounts other than his own. The first method of using the username
and password will give them access only to their own accounts.
For example:
If [account] and [user] are omitted, all the accounts will be listed.
If [account] is included but not [user], a list of users within that account will be listed.
If [account] and [user] are included, a list of groups that the user belongs to will be listed.
If the [user] is .groups, the active groups for that account will be listed.
The default output format is in tabular format. Adding -p option provides the output in plain text
format, -j provides the output in JSON format.
190
CHAPTER 6. CREATING ACCESS TO VOLUMES
You also have the option to provide the expected life of tokens, delete all tokens or delete all tokens
for a given account.
# gswauth-cleanup-tokens [options]
For example
The tokens will be deleted on the disk but it would still persist in memcached.
You can add the following options while cleaning up the tokens:
-t, --token-life: The expected life of tokens. The token objects modified before the give number
of seconds will be checked for expiration (default: 86400).
--purge: Purges all the tokens for a given account whether the tokens have expired or not.
--purge-all: Purges all the tokens for all the accounts and users whether the tokens have
expired or not.
WARNING
TempAuth authentication service must only be used in test deployments and not
for production.
TempAuth is automatically installed when you install Red Hat Gluster Storage. TempAuth stores user
and password information as cleartext in a single proxy-server.conf file. In your
/etc/swift/proxy-server.conf file, enable TempAuth in pipeline and add user information in
TempAuth section by referencing the below example.
[pipeline:main]
pipeline = catch_errors healthcheck proxy-logging cache tempauth proxy-
logging proxy-server
[filter:tempauth]
use = egg:swift#tempauth
user_admin_admin = admin.admin.reseller_admin
user_test_tester = testing .admin
user_test_tester2 = testing2
191
Administration Guide
Here the accountname is the Red Hat Gluster Storage volume used to store objects.
You must restart the Object Store services for the configuration changes to take effect. For
information on restarting the services, see Section 6.5.7.9, “Starting and Stopping Server” .
Create a new configuration file /etc/swift/swift.conf by referencing the template file available
at /etc/swift/swift.conf-gluster.
The Object Expiration feature allows you to schedule automatic deletion of objects that are stored in
the Red Hat Gluster Storage volume. You can use the object expiration feature to specify a lifetime for
specific objects in the volume; when the lifetime of an object expires, the object store would
automatically quit serving that object and would shortly thereafter remove the object from the Red
Hat Gluster Storage volume. For example, you might upload logs periodically to the volume, and you
might need to retain those logs for only a specific amount of time.
The client uses the X-Delete-At or X-Delete-After headers during an object PUT or POST and the Red
Hat Gluster Storage volume would automatically quit serving that object.
NOTE
Expired objects appear in container listings until they are deleted by the object-
expirer daemon. This is an expected behavior.
A DELETE object request on an expired object would delete the object from Red Hat
Gluster Storage volume (if it is yet to be deleted by the object expirer daemon).
However, the client would get a 404 (Not Found) status in return. This is also an
expected behavior.
192
CHAPTER 6. CREATING ACCESS TO VOLUMES
Object expirer uses a separate account (a Red Hat Gluster Storage volume) named gsexpiring for
managing object expiration. Hence, you must create a Red Hat Gluster Storage volume and name it as
gsexpiring.
When you use the X-Delete-At or X-Delete-After headers during an object PUT or POST, the object is
scheduled for deletion. The Red Hat Gluster Storage volume would automatically quit serving that
object at the specified time and will shortly thereafter remove the object from the Red Hat Gluster
Storage volume.
Use PUT operation while uploading a new object. To assign expiration headers to existing objects, use
the POST operation.
X-Delete-At header
The X-Delete-At header requires a UNIX epoch timestamp, in integer form. For example, 1418884120
represents Thu, 18 Dec 2014 06:27:31 GMT. By setting the header to a specific epoch time, you indicate
when you want the object to expire, not be served, and be deleted completely from the Red Hat Gluster
Storage volume. The current time in Epoch notation can be found by running this command:
$ date +%s
Set the object expiry time during an object PUT with X-Delete-At header using cURL:
Set the object expiry time during an object PUT with X-Delete-At header using swift client:
# swift --os-auth-token=AUTH_tk99a39aecc3dd4f80b2b1e801d00df846 --
os-storage-url=http://127.0.0.1:8080/v1/AUTH_test upload container1
./localfile --header 'X-Delete-At: 1392013619'
X-Delete-After
The X-Delete-After header takes an integer number of seconds that represents the amount of time
from now when you want the object to be deleted.
Set the object expiry time with an object PUT with X-Delete-After header using cURL:
Set the object expiry time with an object PUT with X-Delete-At header using swift client:
# swift --os-auth-token=AUTH_tk99a39aecc3dd4f80b2b1e801d00df846 --
os-storage-url=http://127.0.0.1:8080/v1/AUTH_test upload container1
./localfile --header 'X-Delete-After: 3600'
193
Administration Guide
The object-expirer service runs once in every 300 seconds, by default. You can modify the duration by
configuring interval option in /etc/swift/object-expirer.conf file. For every pass it makes,
it queries the gsexpiring account for tracker objects. Based on the timestamp and path present in the
name of tracker objects, object-expirer deletes the actual object and the corresponding tracker
object.
# swift-object-expirer -o -v /etc/swift/object-expirer.conf
After creating configuration files, you must now add configuration details for the system to identify the
Red Hat Gluster Storage volumes to be accessible as Object Store. These configuration details are
added to the ring files. The ring files provide the list of Red Hat Gluster Storage volumes to be
accessible using the object storage interface to the Swift on File component.
Create the ring files for the current configurations by running the following command:
# cd /etc/swift
# gluster-swift-gen-builders VOLUME [VOLUME...]
For example,
# cd /etc/swift
# gluster-swift-gen-builders testvol1 testvol2 testvol3
Here testvol1, testvol2, and testvol3 are the Red Hat Gluster Storage volumes which will be mounted
locally under the directory mentioned in the object, container, and account configuration files (default
value is /mnt/gluster-object). The default value can be changed to a different path by changing
the devices configurable option across all account, container, and object configuration files. The
path must contain Red Hat Gluster Storage volumes mounted under directories having the same
names as volume names. For example, if devices option is set to /home, it is expected that the
volume named testvol1 be mounted at /home/testvol1.
Note that all the volumes required to be accessed using the Swift interface must be passed to the
gluster-swift-gen-builders tool even if it was previously added. The gluster-swift-gen-
builders tool creates new ring files every time it runs successfully.
To remove a VOLUME, run gluster-swift-gen-builders only with the volumes which are required
to be accessed using the Swift interface.
For example, to remove the testvol2 volume, run the following command:
You must restart the Object Store services after creating the new ring files.
194
CHAPTER 6. CREATING ACCESS TO VOLUMES
You must start or restart the server manually whenever you update or modify the configuration files.
These processes must be owned and run by the root user.
# chkconfig memcached on
# chkconfig openstack-swift-proxy on
# chkconfig openstack-swift-account on
# chkconfig openstack-swift-container on
# chkconfig openstack-swift-object on
# chkconfig openstack-swift-object-expirer on
Configuring the gluster-swift services to start at boot time by using the systemctl command may
require additional configuration. Refer to https://access.redhat.com/solutions/2043773 for details if
you encounter problems.
IMPORTANT
You must restart all Object Store services servers whenever you change the
configuration and ring files.
195
Administration Guide
Creating container and objects in Red Hat Gluster Storage Object Store is very similar to OpenStack
swift. For more information on Swift operations, see OpenStack Object Storage API Reference Guide
available at http://docs.openstack.org/api/openstack-object-storage/1.0/content/.
You can create a subdirectory object under a container using the headers Content-Type:
application/directory and Content-Length: 0. However, the current behavior of Object Store
returns 200 OK on a GET request on subdirectory but this does not list all the objects under that
subdirectory.
Swift ACLs work with users and accounts. ACLs are set at the container level and support lists for read
and write access. For more information on Swift ACLs, see http://docs.openstack.org/user-
guide/content/managing-openstack-object-storage-with-swift-cli.html.
Check the operating versions of the clients connected to a given volume by running the following
command:
Use all in place of the name of your volume if you want to see the operating versions of
clients connected to all volumes in the cluster.
1. Perform a state dump for the volume whose clients you want to check.
# gluster --print-statedumpdir
3. Locate the state dump file and grep for client information.
196
CHAPTER 7. INTEGRATING RED HAT GLUSTER STORAGE WITH WINDOWS ACTIVE DIRECTORY
This section assumes that you have an active directory domain installed. Before we go ahead with the
configuration details, following is a list of data along with examples that will be used in the sections
ahead.
Table 7.1.
197
Administration Guide
7.1. PREREQUISITES
Before integration, the following steps have to be completed on an existing Red Hat Gluster Storage
environment:
Name Resolution
The Red Hat Gluster Storage nodes must be able to resolve names from the AD domain via
DNS. To verify the same you can use the following command:
host dc1.addom.example.com
where, addom.example.com is the AD domain and dc1 is the name of a domain controller.
For example, the /etc/resolv.conf file in a static network configuration could look like
this:
domain addom.example.com
search addom.example.com
nameserver 10.11.12.1 # dc1.addom.example.com
nameserver 10.11.12.2 # dc2.addom.example.com
This example assumes that both the domain controllers are also the DNS servers of the
domain.
Kerberos Packages
If you want to use the kerberos client utilities, like kinit and klist, then manually install the
krb5-workstation using the following command:
On each Red Hat Storage node, edit the file /etc/ntp.conf so the time is synchronized from a
known, reliable time service:
198
CHAPTER 7. INTEGRATING RED HAT GLUSTER STORAGE WITH WINDOWS ACTIVE DIRECTORY
Activate the change on each Red Hat Gluster Storage node by stopping the ntp daemon,
updating the time, then starting the ntp daemon. Verify the change on both servers using the
following commands:
Samba Packages
Ensure to install the following Samba packages along with its dependencies:
CTDB
samba
samba-client
samba-winbind
samba-winbind-modules
7.2. INTEGRATION
Integrating Red Hat Gluster Storage Servers into an Active Directory domain involves the following
series of steps:
1. Configure Authentication
NOTE
Ensure that CTDB is configured before the active directory join. For more
information see, Section 6.3.1 Setting up CTDB for Sambain the Red Hat Gluster
Storage Administration Guide.
The Samba configuration file /etc/samba/smb.conf has to contain the relevant parameters for AD.
Along with that, a few other settings are required in order to activate mapping of user and group IDs.
The following example depicts the minimal Samba configuration for AD integration:
[global]
199
Administration Guide
include = /etc/samba/rhs-samba.conf
WARNING
Make sure to edit the smb.conf file such that the above is the complete global
section in order to prevent gluster mechanisms from changing the above settings
when starting or stopping the ctdb lock volume.
The netbios name consists of only one name which has to be the same name on all cluster nodes.
Windows clients will only access the cluster via that name (either in this short form or as an FQDN).
The individual node hostname (rhs-srv1, rhs-srv2, …) must not be used for the netbios name
parameter.
NOTE
The idmap range is an example. This range should be chosen big enough to
cover all objects that can possibly be mapped.
If you want to be able to use the individual host names to also access specific
nodes, you can add them to the netbios aliases parameter of smb.conf.
It is also possible to further adapt Samba configuration to meet special needs or to specific properties
of the AD environment. For example, the ID mapping scheme can be changed. Samba offers many
methods for doing id-mapping. One popular way to set up ID mapping in an active directory
environment is to use the idmap_ad module which reads the unix IDs from the AD's special unix
attributes. This has to be configured by the AD domain's administrator before it can be used by Samba
and winbind.
200
CHAPTER 7. INTEGRATING RED HAT GLUSTER STORAGE WITH WINDOWS ACTIVE DIRECTORY
In order for Samba to use idmap_ad, the AD domain admin has to prepare the AD domain for using the
so called unix extensions and assign unix IDs to all users and groups that should be able to access the
Samba server.
Other possible idmap backends are rid and autorid and the default tdb. The smb.conf manpage
and the manpages for the various idmap modules contain all the details.
For example, following is an extended Samba configuration file to use the idmap_ad back-end for the
ADDOM domain.
[global]
netbios name = RHS-SMB
workgroup = ADDOM
realm = addom.example.com
security = ads
clustering = yes
idmap config * : backend = tdb
idmap config * : range = 1000000-1999999
idmap config ADDOM : backend = ad
idmap config ADDOM : range = 3000000-3999999
idmap config addom : schema mode = rfc2307
winbind nss info = rfc2307
include = /etc/samba/rhs-samba.conf
NOTE
The schema mode and the winbind nss info setting should have the same value.
If the domain is at level 2003R2 or newer, then rfc2307 is the correct value. For
older domains, additional values sfu and sfu20 are available. See the manual
pages of idmap_ad and smb.conf for further details.
Parameter Description
201
Administration Guide
Test the new configuration file using the testparm command. For example:
# testparm -s
Load smb config files from /etc/samba/smb.conf
rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384)
Loaded services file OK.
# Global parameters
[global]
workgroup = ADDOM
realm = addom.example.com
netbios name = RHS-SMB
security = ADS
clustering = Yes
winbind nss info = rfc2307
idmap config addom : schema mode = rfc2307
idmap config addom : range = 3000000-3999999
idmap config addom : backend = ad
idmap config * : range = 1000000-1999999
idmap config * : backend = tdb
Once the Samba configuration has been made, Samba has to be enabled to use the mapped users and
groups from AD. This is achieved via the local Name Service Switch (NSS) that has to be made aware of
the winbind. To use the winbind NSS module, edit the /etc/nsswitch.conf file. Make sure the file
contains the winbind entries for the passwd and group databases. For example:
...
passwd: files winbind
group: files winbind
...
This will enable the use of winbind and should make users and groups visible on the individual
cluster node once Samba is joined to AD and winbind is started.
202
CHAPTER 7. INTEGRATING RED HAT GLUSTER STORAGE WITH WINDOWS ACTIVE DIRECTORY
NOTE
If your configuration has CTDB managing Winbind and Samba, they can be
temporarily disabled with the following commands (to be executed prior to the
above stop commands) so as to prevent CTDB going into an unhealthy state
when they are shut down:
For some versions of RHGS, a bug in the selinux policy prevents 'ctdb
disablescript SCRIPT' from succeeding. If this is the case, 'chmod -x
/etc/ctdb/events.d/SCRIPT' can be executed as a workaround from a root shell.
Shutting down winbind and smb is primarily to prevent access to SMB services
during this AD integration. These services may be left running but access to
them should be prevented through some other means.
The join is initiated via the net utility from a single node:
WARNING
The following step must be executed only on one cluster node and should not be
repeated on other cluster nodes. CTDB makes sure that the whole cluster is joined
by this step.
Once the join is successful, the cluster ip addresses and the cluster netbios name should be made
public in the network. For registering multiple public cluster IP addresses in the AD DNS server, the
net utility can be used again:
# net ads dns register rhs-smb <PUBLIC IP 1> <PUBLIC IP 2> ...
This command will make sure the DNS name rhs-smb will resolve to the given public IP addresses. The
DNS registrations use the cluster machine account for authentication in AD, which means this
operation only can be done after the join has succeeded.
203
Administration Guide
Registering the NetBIOS name of the cluster is done by the nmbd service. In order to make sure that
the nmbd instances on the hosts don’t overwrite each other’s registrations, the ‘cluster addresses’
smb.conf option should be set to the list of public addresses of the whole cluster.
NOTE
If you previously disabled CTDB’s ability to manage Winbind and Samba they can
be re-enabled with the following commands:
For some versions of RHGS, a bug in the selinux polict prevents 'ctdb
enablescript SCRIPT' from succeeding. If this is the case, 'chmod +x
/etc/ctdb/events.d/SCRIPT' can be executed as a workaround from a root shell.
Ensure that the winbind starts after a reboot. This is achieved by adding
‘CTDB_MANAGES_WINBIND=yes’ to the /etc/sysconfig/ctdb file on all nodes.
Verify the join to check if the created machine account can be used to authenticate to the AD
LDAP server using the following command:
2. Execute the following command to display the machine account’s LDAP object
204
CHAPTER 7. INTEGRATING RED HAT GLUSTER STORAGE WITH WINDOWS ACTIVE DIRECTORY
instanceType: 4
whenCreated: 20150922013713.0Z
whenChanged: 20151126111120.0Z
displayName: RHS-SMB$
uSNCreated: 221763
uSNChanged: 324438
name: rhs-smb
objectGUID: a178177e-4aa4-4abc-9079-d1577e137723
userAccountControl: 69632
badPwdCount: 0
codePage: 0
countryCode: 0
badPasswordTime: 130880426605312806
lastLogoff: 0
lastLogon: 130930100623392945
localPolicyFlags: 0
pwdLastSet: 130930098809021309
primaryGroupID: 515
objectSid: S-1-5-21-2562125317-1564930587-1029132327-1196
accountExpires: 9223372036854775807
logonCount: 1821
sAMAccountName: rhs-smb$
sAMAccountType: 805306369
dNSHostName: rhs-smb.addom.example.com
servicePrincipalName: HOST/rhs-smb.addom.example.com
servicePrincipalName: HOST/RHS-SMB
objectCategory:
CN=Computer,CN=Schema,CN=Configuration,DC=addom,DC=example,DC=com
isCriticalSystemObject: FALSE
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 130929563322279307
msDS-SupportedEncryptionTypes: 31
3. Execute the following command to display general information about the AD server:
Execute the following command to verify if winbindd can use the machine account for
authentication to AD
# wbinfo -t
checking the trust secret for domain ADDOM via RPC calls succeeded
5. Execute the following command to resolve the given name to a Windows SID
205
Administration Guide
# wbinfo -a 'ADDOM\user'
Enter ADDOM\user's password:
plaintext password authentication succeeded
Enter ADDOM\user's password:
challenge/response password authentication succeeded
or,
# wbinfo -a 'ADDOM\user%password'
plaintext password authentication succeeded
challenge/response password authentication succeeded
8. Execute the following command to verify if the winbind Name Service Switch module works
correctly:
9. Execute the following command to verify if samba can use winbind and the NSS module
correctly:
Server Comment
--------- -------
RHS-SMB Samba 4.2.4
Workgroup Master
--------- -------
ADDOM RHS-SMB
206
PART IV. MANAGE
207
Administration Guide
In the Snapshot Architecture diagram, Red Hat Gluster Storage volume consists of multiple bricks
(Brick1 Brick2 etc) which is spread across one or more nodes and each brick is made up of independent
thin Logical Volumes (LV). When a snapshot of a volume is taken, it takes the snapshot of the LV and
creates another brick. Brick1_s1 is an identical image of Brick1. Similarly, identical images of each brick
is created and these newly created bricks combine together to form a snapshot volume.
Crash Consistency
A crash consistent snapshot is captured at a particular point-in-time. When a crash consistent
snapshot is restored, the data is identical as it was at the time of taking a snapshot.
NOTE
Online Snapshot
Snapshot is an online snapshot hence the file system and its associated data continue to be
available for the clients even while the snapshot is being taken.
Quorum Based
The quorum feature ensures that the volume is in a good condition while the bricks are down. If
any brick that is down for a n way replication, where n <= 2 , quorum is not met. In a n-way
replication where n >= 3, quorum is met when m bricks are up, where m >= (n/2 +1) where n is
odd and m >= n/2 and the first brick is up where n is even. If quorum is not met snapshot
creation fails.
208
CHAPTER 8. MANAGING SNAPSHOTS
NOTE
Barrier
To guarantee crash consistency some of the fops are blocked during a snapshot operation.
These fops are blocked till the snapshot is complete. All other fops is passed through. There is
a default time-out of 2 minutes, within that time if snapshot is not complete then these fops
are unbarriered. If the barrier is unbarriered before the snapshot is complete then the
snapshot operation fails. This is to ensure that the snapshot is in a consistent state.
NOTE
Taking a snapshot of a Red Hat Gluster Storage volume that is hosting the Virtual
Machine Images is not recommended. Taking a Hypervisor assisted snapshot of a virtual
machine would be more suitable in this use case.
8.1. PREREQUISITES
Before using this feature, ensure that the following prerequisites are met:
Snapshot is based on thinly provisioned LVM. Ensure the volume is based on LVM2. Red Hat
Gluster Storage is supported on Red Hat Enterprise Linux 6.7 and later and Red Hat Enterprise
Linux 7.1 and later. Both these versions of Red Hat Enterprise Linux is based on LVM2 by
default. For more information, see https://access.redhat.com/site/documentation/en-
US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/thinprovisioned_volum
The logical volume which contains the brick must not contain any data other than the brick.
Only linear LVM is supported with Red Hat Gluster Storage. For more information, see
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/4/html-
single/Cluster_Logical_Volume_Manager/#lv_overview
Each snapshot creates as many bricks as in the original Red Hat Gluster Storage volume.
Bricks, by default, use privileged ports to communicate. The total number of privileged ports in
a system is restricted to 1024. Hence, for supporting 256 snapshots per volume, the following
options must be set on Gluster volume. These changes will allow bricks and glusterd to
communicate using non-privileged ports.
2. Edit the /etc/glusterfs/glusterd.vol in each Red Hat Gluster Storage node, and
add the following setting:
option rpc-auth-allow-insecure on
209
Administration Guide
3. Restart glusterd service on each Red Hat Server node using the following command:
Recommended Setup
The recommended setup for using Snapshot is described below. In addition, you must ensure to read
Chapter 21, Tuning for Performance for enhancing snapshot performance:
For each volume brick, create a dedicated thin pool that contains the brick of the volume and
its (thin) brick snapshots. With the current thin-p design, avoid placing the bricks of different
Red Hat Gluster Storage volumes in the same thin pool, as this reduces the performance of
snapshot operations, such as snapshot delete, on other unrelated volumes.
The recommended thin pool chunk size is 256KB. There might be exceptions to this in cases
where we have a detailed information of the customer's workload.
The recommended pool metadata size is 0.1% of the thin pool size for a chunk size of 256KB or
larger. In special cases, where we recommend a chunk size less than 256KB, use a pool
metadata size of 0.5% of thin pool size.
For Example
pvcreate /dev/sda1
Use the correct dataalignment option based on your device. For more information,
Section 21.2, “Brick Configuration”
2. Create a Volume Group (VG) from the PV using the following command:
A thin pool of size 1 TB is created, using a chunksize of 256 KB. Maximum pool metadata size of
16 G is used.
4. Create a thinly provisioned volume from the previously created pool using the following
command:
5. Create a file system (XFS) on this. Use the recommended options to create the XFS file system
on the thin LV.
For example,
210
CHAPTER 8. MANAGING SNAPSHOTS
6. Mount this logical volume and use the mount path as the brick.
Red Hat Gluster Storage volume has to be present and the volume has to be in the Started
state.
All the bricks of the volume have to be on an independent thin logical volume(LV).
All the bricks of the volume should be up and running, unless it is a n-way replication where n
>= 3. In such case quorum must be met. For more information see Chapter 8, Managing
Snapshots
No other volume operation, like rebalance, add-brick, etc, should be running on the
volume.
Total number of snapshots in the volume should not be equal to Effective snap-max-hard-limit.
For more information see Configuring Snapshot Behavior.
If you have a geo-replication setup, then pause the geo-replication session if it is running, by
executing the following command:
For example,
Ensure that you take the snapshot of the master volume and then take snapshot of the slave
volume.
where,
VOLNAME(S) - Name of the volume for which the snapshot will be created. We only support
creating snapshot of single volume.
211
Administration Guide
description - This is an optional field that can be used to provide a description of the snap that
will be saved along with the snap.
force - Snapshot creation will fail if any brick is down. In a n-way replicated Red Hat Gluster
Storage volume where n >= 3 snapshot is allowed even if some of the bricks are down. In such
case quorum is checked. Quorum is checked only when the force option is provided, else by-
default the snapshot create will fail if any brick is down. Refer the Overview section for more
details on quorum.
no-timestamp: By default a timestamp is appended to the snapshot name. If you do not want to
append timestamp then pass no-timestamp as an argument.
For Example 1:
For Example 2:
Snapshot of a Red Hat Gluster Storage volume creates a read-only Red Hat Gluster Storage volume.
This volume will have identical configuration as of the original / parent volume. Bricks of this newly
created snapshot is mounted as /var/run/gluster/snaps/<snap-volume-
name>/brick<bricknumber>.
For example, a snapshot with snap volume name 0888649a92ea45db8c00a615dfc5ea35 and having
two bricks will have the following two mount points:
/var/run/gluster/snaps/0888649a92ea45db8c00a615dfc5ea35/brick1
/var/run/gluster/snaps/0888649a92ea45db8c00a615dfc5ea35/brick2
NOTE
If you have a geo-replication setup, after creating the snapshot, resume the geo-
replication session by running the following command:
For example,
212
CHAPTER 8. MANAGING SNAPSHOTS
where,
clonename: It is the name of the clone, ie, the new volume that will be created.
NOTE
Unlike restoring a snapshot, the original snapshot is still retained, after it has
been cloned.
The snapshot should be in activated state and all the snapshot bricks should be
in running state before taking clone. Also the server nodes should be in quorum.
This is a space efficient clone therefore both the Clone (new volume) and the
snapshot LVM share the same LVM backend. The space consumption of the LVM
grow as the new volume (clone) diverge from the snapshot.
For example:
To check the status of the newly cloned snapshot execute the following command
For example:
213
Administration Guide
In the example it is observed that clone is in Created state, similar to a newly created volume. This
volume should be explicitly started to use this volume.
where,
VOLNAME - This is an optional field and if provided lists the snapshot names of all snapshots
present in the volume.
For Example:
where,
snapname - This is an optional field. If the snapname is provided then the information about the
specified snap is displayed.
VOLNAME - This is an optional field. If the VOLNAME is provided the information about all the
snaps in the specified volume is displayed.
For Example:
214
CHAPTER 8. MANAGING SNAPSHOTS
This command displays the running status of the snapshot. By default the status of all the snapshots in
the cluster is displayed. To check the status of all the snapshots that are taken for a particular volume,
specify a volume name:
where,
snapname - This is an optional field. If the snapname is provided then the status about the
specified snap is displayed.
VOLNAME - This is an optional field. If the VOLNAME is provided the status about all the snaps
in the specified volume is displayed.
For Example:
Brick Path :
10.70.42.248:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick
1/brick1
Volume Group : snap_lvgrp1
Brick Running : Yes
Brick PID : 1640
Data Percentage : 1.54
LV Size : 616.00m
Brick Path :
10.70.43.139:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick
2/brick3
Volume Group : snap_lvgrp1
Brick Running : Yes
Brick PID : 3900
Data Percentage : 1.80
LV Size : 616.00m
Brick Path :
10.70.43.34:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick3
/brick4
Volume Group : snap_lvgrp1
Brick Running : Yes
Brick PID : 3507
Data Percentage : 1.80
LV Size : 616.00m
snap-max-hard-limit: If the snapshot count in a volume reaches this limit then no further
215
Administration Guide
snapshot creation is allowed. The range is from 1 to 256. Once this limit is reached you have to
remove the snapshots to create further snapshots. This limit can be set for the system or per
volume. If both system limit and volume limit is configured then the effective max limit would
be the lowest of the two value.
snap-max-soft-limit: This is a percentage value. The default value is 90%. This configuration
works along with auto-delete feature. If auto-delete is enabled then it will delete the oldest
snapshot when snapshot count in a volume crosses this limit. When auto-delete is disabled it
will not delete any snapshot, but it will display a warning message to the user.
auto-delete: This will enable or disable auto-delete feature. By default auto-delete is disabled.
When enabled it will delete the oldest snapshot when snapshot count in a volume crosses the
snap-max-soft-limit. When disabled it will not delete any snapshot, but it will display a warning
message to the user
where:
VOLNAME: This is an optional field. The name of the volume for which the configuration
values are to be displayed.
If the volume name is not provided then the configuration values of all the volume is displayed.
System configuration details are displayed irrespective of whether the volume name is
specified or not.
For Example:
Volume : test_vol
snap-max-hard-limit : 256
Effective snap-max-hard-limit : 256
Effective snap-max-soft-limit : 230 (90%)
Volume : test_vol1
snap-max-hard-limit : 256
Effective snap-max-hard-limit : 256
Effective snap-max-soft-limit : 230 (90%)
216
CHAPTER 8. MANAGING SNAPSHOTS
where:
VOLNAME: This is an optional field. The name of the volume for which the configuration
values are to be changed. If the volume name is not provided, then running the command
will set or change the system limit.
snap-max-hard-limit: Maximum hard limit for the system or the specified volume.
For Example:
where:
force: If some of the bricks of the snapshot volume are down then use the force command to
start them.
For Example:
where:
For example:
217
Administration Guide
No volume operation (e.g. add-brick, rebalance, etc) should be running on the original / parent
volume of the snapshot.
where,
For Example:
NOTE
Red Hat Gluster Storage volume cannot be deleted if any snapshot is associated with
the volume. You must delete all the snapshots before issuing a volume delete.
To delete all the snapshots present in a system, execute the following command:
To delete all the snapshot present in a specified volume, execute the following command:
218
CHAPTER 8. MANAGING SNAPSHOTS
No volume operation (e.g. add-brick, rebalance, etc) should be running on the origin or parent
volume of the snapshot.
where,
For Example:
After snapshot is restored and the volume is started, trigger a self-heal by running the
following command:
NOTE
After restore the brick path of the original volume will change. If you are
using fstab to mount the bricks of the origin volume then you have to fix
fstab entries after restore. For more information see,
https://access.redhat.com/site/documentation/en-
US/Red_Hat_Enterprise_Linux/6/html/Installation_Guide/apcs04s07.html
In the cluster, identify the nodes participating in the snapshot with the snapshot status
command. For example:
Brick Path :
10.70.43.46:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/
brick2/brick2
Volume Group : snap_lvgrp
Brick Running : Yes
Brick PID : 8303
Data Percentage : 0.43
LV Size : 2.60g
Brick Path :
10.70.42.33:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/
brick3/brick3
Volume Group : snap_lvgrp
219
Administration Guide
Brick Path :
10.70.42.34:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/
brick4/brick4
Volume Group : snap_lvgrp
Brick Running : Yes
Brick PID : 23557
Data Percentage : 12.41
LV Size : 2.60g
4. Start the slave volume first and then the master volume.
220
CHAPTER 8. MANAGING SNAPSHOTS
For example,
Since the Red Hat Gluster Storage snapshot volume is read-only, no write operations are allowed on
this mount. After mounting the snapshot the entire snapshot content can then be accessed in a read-
only mode.
NOTE
Snapshots can also be accessed via User Serviceable Snapshots. For more information see,
Section 8.13, “User Serviceable Snapshots”
WARNING
8.12.1. Prerequisites
To initialize snapshot scheduler on all the nodes of the cluster, execute the following
command:
snap_scheduler.py init
221
Administration Guide
This command initializes the snap_scheduler and interfaces it with the crond running on the
local node. This is the first step, before executing any scheduling related commands from a
node.
NOTE
This command has to be run on all the nodes participating in the scheduling.
Other options can be run independently from any node, where initialization has
been successfully completed.
All nodes in the cluster have their times synced using NTP or any other mechanism. This is a
hard requirement for this feature to work.
If you are on Red Hat Enterprise Linux 7.1 or later, set the
cron_system_cronjob_use_shares boolean to on by running the following command:
# setsebool -P cron_system_cronjob_use_shares on
NOTE
There is a latency of one minute, between providing a command by the helper script and
for the command to take effect. Hence, currently, we do not support snapshot schedules
with per minute granularity.
snap_scheduler.py enable
NOTE
For example:
# snap_scheduler.py enable
snap_scheduler: Snapshot scheduling is enabled
snap_scheduler.py disable
222
CHAPTER 8. MANAGING SNAPSHOTS
For example:
# snap_scheduler.py disable
snap_scheduler: Snapshot scheduling is disabled
snap_scheduler.py status
For example:
# snap_scheduler.py status
snap_scheduler: Snapshot scheduling status: Disabled
where,
Job Name: This name uniquely identifies this particular schedule, and can be used to reference this
schedule for future events like edit/delete. If a schedule already exists for the specified Job Name, the
add command will fail.
Schedule: The schedules are accepted in the format crond understands. For example:
Volume name: The name of the volume on which the scheduled snapshot operation will be performed
For example:
223
Administration Guide
NOTE
The snapshots taken by the scheduler will have the following naming convention:
Scheduler-<Job Name>-<volume name>_<Timestamp>.
For example:
Scheduled-Job1-test_vol_GMT-2015.06.19-09.47.01
where,
Job Name: This name uniquely identifies this particular schedule, and can be used to reference this
schedule for future events like edit/delete. If a schedule already exists for the specified Job Name, the
add command will fail.
Schedule: The schedules are accepted in the format crond understands. For example:
Volume name: The name of the volume on which the snapshot schedule will be edited.
For Example:
snap_scheduler.py list
For example:
# snap_scheduler.py list
JOB_NAME SCHEDULE OPERATION VOLUME NAME
--------------------------------------------------------------------
Job0 * * * * * Snapshot Create
test_vol
224
CHAPTER 8. MANAGING SNAPSHOTS
where,
Job Name: This name uniquely identifies the particular schedule that has to be deleted.
For example:
Consider a scenario where a user wants to access a file test.txt which was in the Home directory a
couple of months earlier and was deleted accidentally. You can now easily go to the virtual .snaps
directory that is inside the home directory and recover the test.txt file using the cp command.
NOTE
User Serviceable Snapshot is not the recommended option for bulk data access
from an earlier snapshot volume. For such scenarios it is recommended to
mount the Snapshot volume and then access the data. For more information see,
Chapter 8, Managing Snapshots
Therefore, as the number of active snapshots grow, the total memory footprint
of the snapshot daemon (snapd) also grows. Therefore, in a low memory system,
the snapshot daemon can get OOM killed if there are too many active snapshots
For example:
225
Administration Guide
For example:
For every snapshot available for a volume, any user who has access to the volume will have a read-only
view of the volume. You can recover the files through these read-only views of the volume from
different point in time. Each snapshot of the volume will be available in the .snaps directory of every
directory of the mounted volume.
NOTE
For NFS mount refer Section 6.2.2.2.1, “Manually Mounting Volumes Using Gluster NFS”
for more details. Following command is an example.
For FUSE mount refer Section 6.1.3.2, “Mounting Volumes Manually” for more details.
Following command is an example.
The .snaps directory is a virtual directory which will not be listed by either the ls command, or the
ls -a option. The .snaps directory will contain every snapshot taken for that given volume as
individual directories. Each of these snapshot entries will in turn contain the data of the particular
directory the user is accessing from when the snapshot was taken.
1. Go to the folder where the file was present when the snapshot was taken. For example, if you
had a test.txt file in the root directory of the mount that has to be recovered, then go to that
directory.
# cd /mnt/glusterfs
226
CHAPTER 8. MANAGING SNAPSHOTS
NOTE
Since every directory has a virtual .snaps directory, you can enter the .snaps
directory from here. Since .snaps is a virtual directory, ls and ls -a
command will not list the .snaps directory. For example:
# ls -a
....Bob John test1.txt test2.txt
# cd .snaps
For example:
# ls -p
snapshot_Dec2014/ snapshot_Nov2014/ snapshot_Oct2014/
snapshot_Sept2014/
For example:
cd snapshot_Nov2014
# ls -p
John/ test1.txt test2.txt
# cp -p test2.txt $HOME
8.13.3. Viewing and Retrieving Snapshots using CIFS for Windows Client
For every snapshot available for a volume, any user who has access to the volume will have a read-only
view of the volume. You can recover the files through these read-only views of the volume from
different point in time. Each snapshot of the volume will be available in the .snaps folder of every
folder in the root of the CIFS share. The .snaps folder is a hidden folder which will be displayed only
when the following option is set to ON on the volume using the following command:
After the option is set to ON, every Windows client can access the .snaps folder by following these
steps:
1. In the Folder options, enable the Show hidden files, folders, and drives option.
227
Administration Guide
NOTE
The .snaps folder is accessible only in the root of the CIFS share and not in any
sub folders.
3. The list of snapshots are available in the .snaps folder. You can now access the required file
and retrieve it.
You can also access snapshots on Windows using Samba. For more information see, Section 6.3.6,
“Accessing Snapshots in Windows”.
8.14. TROUBLESHOOTING
Situation
Snapshot creation fails.
Step 1
Check if the bricks are thinly provisioned by following these steps:
1. Execute the mount command and check the device name mounted on the brick path. For
example:
# mount
/dev/mapper/snap_lvgrp-snap_lgvol on /rhgs/brick1 type xfs (rw)
/dev/mapper/snap_lvgrp1-snap_lgvol1 on /rhgs/brick2 type xfs (rw)
2. Run the following command to check if the device has a LV pool name.
lvs device-name
For example:
If the Pool field is empty, then the brick is not thinly provisioned.
3. Ensure that the brick is thinly provisioned, and retry the snapshot create command.
Step 2
Check if the bricks are down by following these steps:
2. If any bricks are down, then start the bricks by executing the following command:
228
CHAPTER 8. MANAGING SNAPSHOTS
Step 3
Check if the node is down by following these steps:
2. If a brick is not listed in the status, then execute the following command:
3. If the status of the node hosting the missing brick is Disconnected, then power-up the
node.
Step 4
Check if rebalance is in progress by following these steps:
Situation
Snapshot delete fails.
Step 1
Check if the server quorum is met by following these steps:
2. If nodes are down, and the cluster is not in quorum, then power up the nodes.
229
Administration Guide
Situation
Snapshot delete command fails on some node(s) during commit phase, leaving the system
inconsistent.
Solution
1. Identify the node(s) where the delete command failed. This information is available in the
delete command's error output. For example:
2. On the node where the delete command failed, bring down glusterd using the following
command:
# rm -rf /var/lib/glusterd/snaps/snapshot1
5. Repeat the 2nd, 3rd, and 4th steps on all the nodes where the commit failed as identified in
the 1st step.
Situation
Snapshot restore fails.
Step 1
Check if the server quorum is met by following these steps:
2. If nodes are down, and the cluster is not in quorum, then power up the nodes.
230
CHAPTER 8. MANAGING SNAPSHOTS
Step 2
Check if the volume is in Stop state by following these steps:
2. If the volume is in Started state, then stop the volume using the following command:
Situation
The brick process is hung.
Solution
Check if the LVM data / metadata utilization had reached 100% by following these steps:
1. Execute the mount command and check the device name mounted on the brick path. For
example:
# mount
/dev/mapper/snap_lvgrp-snap_lgvol on /rhgs/brick1 type xfs
(rw)
/dev/mapper/snap_lvgrp1-snap_lgvol1 on /rhgs/brick2 type
xfs (rw)
lvs -v device-name
For example:
# lvs -o data_percent,metadata_percent -v
/dev/mapper/snap_lvgrp-snap_lgvol
Using logical volume(s) on command line
Data% Meta%
0.40
NOTE
Ensure that the data and metadata does not reach the maximum limit. Usage of
monitoring tools like Nagios, will ensure you do not come across such situations.
For more information about Nagios, see Chapter 18, Monitoring Red Hat Gluster
Storage
Situation
231
Administration Guide
Step 1
Check if there is a mismatch in the operating versions by following these steps:
1. Open the following file and check for the operating version:
/var/lib/glusterd/glusterd.info
If the operating-version is lesser than 30000, then the snapshot commands are not
supported in the version the cluster is operating on.
2. Upgrade all nodes in the cluster to Red Hat Gluster Storage 3.2 or higher.
Situation
After rolling upgrade, snapshot feature does not work.
Solution
You must ensure to make the following changes on the cluster to enable snapshot:
232
CHAPTER 9. MANAGING DIRECTORY QUOTAS
This command only enables quota behavior on the volume; it does not set any default disk usage limits.
To disable quota behavior on a volume, including any set disk usage limits, run the following command:
IMPORTANT
When you disable quotas on Red Hat Gluster Storage 3.1.1 and earlier, all previously
configured limits are removed from the volume by a cleanup process, quota-remove-
xattr.sh. If you re-enable quotas while the cleanup process is still running, the
extended attributes that enable quotas may be removed by the cleanup process. This
has negative effects on quota accounting.
When specifying a directory to limit with the gluster volume quota command, the
directory's path is relative to the Red Hat Gluster Storage volume mount point, not the root
directory of the server or client on which the volume is mounted. That is, if the Red Hat Gluster
Storage volume is mounted at /mnt/glusterfs and you want to place a limit on the
/mnt/glusterfs/dir directory, use /dir as the path when you run the gluster volume
quota command, like so:
Ensure that at least one brick is available per replica set when you run the gluster volume
quota command. A brick is available if a Y appears in the Online column of gluster
volume status command output, like so:
233
Administration Guide
Use the following command to limit the total allowed size of a directory, or the total amount of space to
be consumed on a volume.
For example, to limit the size of the /dir directory on the data volume to 100 GB, run the following
command:
This prevents the /dir directory and all files and directories underneath it from containing more than
100 GB of data cumulatively.
To limit the size of the entire data volume to 1 TB, set a 1 TB limit on the root directory of the volume,
like so:
You can also set a percentage of the hard limit as a soft limit. Exceeding the soft limit for a directory
logs warnings rather than preventing further disk usage. For example, to set a soft limit at 75% of your
volume's hard limit of 1TB, run the following command.
The default soft limit is 80%. However, you can alter the default soft limit on a per-volume basis by
using the default-soft-limit subcommand. For example, to set a default soft limit of 90% on the
data volume, run the following command:
Then verify that the new value is set with the following command:
Changing the default soft limit does not remove a soft limit set with the limit-usage subcommand.
234
CHAPTER 9. MANAGING DIRECTORY QUOTAS
To view limit information for a particular directory, specify the directory path. Remember that the
directory's path is relative to the Red Hat Gluster Storage volume mount point, not the root directory
of the server or client on which the volume is mounted.
For example, to view limits set on the /dir directory of the test-volume volume:
You can also list multiple directories to display disk limit information on each directory specified, like
so:
By default, the df utility does not take quota limits into account when reporting disk usage. This means
that clients accessing directories see the total space available to the volume, rather than the total
space allotted to their directory by quotas. You can configure a volume to display the hard quota limit
as the total disk space instead by setting quota-deem-statfs parameter to on.
This configures df to to display the hard quota limit as the total disk space for a client.
The following example displays the disk usage as seen from a client when quota-deem-statfs is set
to off:
# df -hT /home
Filesystem Type Size Used Avail Use% Mounted on
server1:/test-volume fuse.glusterfs 400G 12G 389G 3% /home
The following example displays the disk usage as seen from a client when quota-deem-statfs is set
to on:
235
Administration Guide
# df -hT /home
Filesystem Type Size Used Avail Use% Mounted on
server1:/test-volume fuse.glusterfs 300G 12G 289G 4% /home
The soft-timeout parameter specifies how often Red Hat Gluster Storage checks space usage when
usage has, so far, been below the soft limit set on the directory or volume. The default soft timeout
frequency is every 60 seconds.
The hard-timeout parameter specifies how often Red Hat Gluster Storage checks space usage when
usage is greater than the soft limit set on the directory or volume. The default hard timeout frequency
is every 5 seconds.
IMPORTANT
Ensure that you take system and application workload into account when you set soft
and hard timeouts, as the margin of error for disk usage is proportional to system
workload.
For example, to remove the disk limit usage on /data directory of test-volume:
236
CHAPTER 9. MANAGING DIRECTORY QUOTAS
This does not remove limits recursively; it only impacts a volume-wide limit.
237
Administration Guide
Geo-replication uses a master–slave model, where replication and mirroring occurs between the
following partners:
Slave – a Red Hat Gluster Storage volume. A slave volume can be a volume on a remote host,
such as remote-host::volname.
Mirrors data across bricks within one trusted Mirrors data across geographically distributed
storage pool. trusted storage pools.
Synchronous replication: each and every file Asynchronous replication: checks for changes in
operation is applied to all the bricks. files periodically, and syncs them on detecting
differences.
238
CHAPTER 10. MANAGING GEO-REPLICATION
Geo-replication provides an incremental replication service over Local Area Networks (LANs), Wide
Area Network (WANs), and the Internet. This section illustrates the most common deployment
scenarios for geo-replication, including the following:
239
Administration Guide
240
CHAPTER 10. MANAGING GEO-REPLICATION
1. Verify that your environment matches the minimum system requirements. See Section 10.3.3,
“Prerequisites”.
2. Determine the appropriate deployment scenario. See Section 10.3.1, “Exploring Geo-
replication Deployment Scenarios”.
3. Start geo-replication on the master and slave systems. See Section 10.4, “Starting Geo-
replication”.
10.3.3. Prerequisites
The following are prerequisites for deploying geo-replication:
The master and slave volumes must be of same version of Red Hat Gluster Storage instances.
Slave node must not be a peer of the any of the nodes of the Master trusted storage pool.
Passwordless SSH access is required between one node of the master volume (the node from
which the geo-replication create command will be executed), and one node of the slave
volume (the node whose IP/hostname will be mentioned in the slave name when running the
geo-replication create command).
Create the public and private keys using ssh-keygen (without passphrase) on the master
node:
# ssh-keygen
Copy the public key to the slave node using the following command:
If you are setting up a non-root geo-replicaton session, then copy the public key to the
respective user location.
NOTE
- Passwordless SSH access is required from the master node to slave node, whereas
passwordless SSH access is not required from the slave node to master node.
A passwordless SSH connection is also required for gsyncd between every node in the
master to every node in the slave. The gluster system:: execute gsec_create
command creates secret-pem files on all the nodes in the master, and is used to implement
the passwordless SSH connection. The push-pem option in the geo-replication create
command pushes these keys to all the nodes in the slave.
241
Administration Guide
Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session” - In this method,
the slave mount is owned by the root user.
Section 10.3.4.2, “Setting Up your Environment for a Secure Geo-replication Slave” - This
method is more secure as the slave mount is owned by a normal user.
Time Synchronization
Before configuring the geo-replication environment, ensure that the time on all the servers are
synchronized.
All the servers' time must be uniform on bricks of a geo-replicated master volume. It is
recommended to set up a NTP (Network Time Protocol) service to keep the bricks' time
synchronized, and avoid out-of-time sync effects.
For example: In a replicated volume where brick1 of the master has the time 12:20, and brick2
of the master has the time 12:10 with a 10 minute time lag, all the changes on brick2 between
in this period may go unnoticed during synchronization of files with a Slave.
1. To create a common pem pub file, run the following command on the master node where the
passwordless SSH connection is configured:
2. Create the geo-replication session using the following command. The push-pem option is
needed to perform the necessary pem-file setup on the slave nodes.
For example:
242
CHAPTER 10. MANAGING GEO-REPLICATION
NOTE
There must be passwordless SSH access between the node from which this
command is run, and the slave host specified in the above command. This
command performs the slave verification, which includes checking for a valid
slave URL, valid slave volume, and available space on the slave. If the
verification fails, you can use the force option which will ignore the failed
verification and create a geo-replication session.
For example:
For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-
Volume”.
4. Start the geo-replication by running the following command on the master node:
For example,
5. Verify the status of the created session by running the following command:
Geo-replication supports access to Red Hat Gluster Storage slaves through SSH using an unprivileged
account (user account with non-zero UID). This method is more secure and it reduces the master's
capabilities over slave to the minimum. This feature relies on mountbroker, an internal service of
glusterd which manages the mounts for unprivileged slave accounts. You must perform additional
steps to configure glusterd with the appropriate mountbroker's access control directives. The
following example demonstrates this process:
Perform the following steps on all the Slave nodes to setup an auxiliary glusterFS mount for the
unprivileged account:
1. In all the slave nodes, create a new group. For example, geogroup.
NOTE
You must not use multiple groups for the mountbroker setup. You can create
multiple user accounts but the group should be same for all the non-root users.
243
Administration Guide
2. In all the slave nodes, create a unprivileged account. For example, geoaccount. Add
geoaccount as a member of geogroup group.
3. On any one of the Slave nodes, run the following command to set up mountbroker root
directory and group.
For example,
4. On any one of the Slave nodes, run the following commands to add volume and user to the
mountbroker service.
For example,
# gluster-mountbroker status
The output displays the mountbroker status for every peer node in the slave cluster.
After you setup an auxiliary glusterFS mount for the unprivileged account on all the Slave
nodes, perform the following steps to setup a non-root geo-replication session.:
7. Setup a passwordless SSH from one of the master node to the user on one of the slave node.
# ssh-keygen
# ssh-copy-id -i identity_file
geoaccount@slave_node_IPaddress/Hostname
8. Create a common pem pub file by running the following command on the master node, where
the passwordless SSH connection is configured to the user on the slave node:
244
CHAPTER 10. MANAGING GEO-REPLICATION
9. Create a geo-replication relationship between the master and the slave to the user by running
the following command on the master node:
For example,
If you have multiple slave volumes and/or multiple accounts, create a geo-replication session
with that particular user and volume.
For example,
For example,
# /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoaccount
MASTERVOL SLAVEVOL_NAME
For example:
For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-
Volume”.
12. Start the geo-replication with slave user by running the following command on the master
node:
For example,
13. Verify the status of geo-replication session by running the following command on the master
node:
245
Administration Guide
For example,
If the volume to be removed is the last one for the mountbroker user, the user is also removed.
IMPORTANT
If you have a secured geo-replication setup, you must ensure to prefix the unprivileged
user account to the slave volume in the command. For example, to execute a geo-
replication status command, run the following:
For example:
246
CHAPTER 10. MANAGING GEO-REPLICATION
IMPORTANT
You must create the geo-replication session before starting geo-replication. For more
information, see Section 10.3.4.1, “Setting Up your Environment for Geo-replication
Session”.
For example:
This command will start distributed geo-replication on all the nodes that are part of the master
volume. If a node that is part of the master volume is down, the command will still be
successful. In a replica pair, the geo-replication session will be active on any of the replica
nodes, but remain passive on the others.
After executing the command, it may take a few minutes for the session to initialize and
become stable.
NOTE
If you attempt to create a geo-replication session and the slave already has
data, the following error message will be displayed:
247
Administration Guide
For example:
This command will force start geo-replication sessions on the nodes that are part of the master
volume. If it is unable to successfully start the geo-replication session on any node which is
online and part of the master volume, the command will still start the geo-replication sessions
on as many nodes as it can. This command can also be used to re-start geo-replication sessions
on the nodes where the session has died, or has not started.
For example:
To display information about all geo-replication sessions, use the following command:
To display information on all geo-replication sessions from a particular master volume, use the
following command:
248
CHAPTER 10. MANAGING GEO-REPLICATION
IMPORTANT
Master Node: Master node and Hostname as listed in the gluster volume info
command output
Slave Node: IP address/hostname of the slave node to which master worker is connected
to.
Status: The status of the geo-replication worker can be one of the following:
Initializing: This is the initial phase of the Geo-replication session; it remains in this
state for a minute in order to make sure no abnormalities are present.
Active: The gsync daemon in this node is active and syncing the data.
Passive: A replica pair of the active node. The data synchronization is handled by the
active node. Hence, this node does not sync any data.
Faulty: The geo-replication session has experienced a problem, and the issue needs to
be investigated further. For more information, see Section 10.11, “Troubleshooting
Geo-replication” section.
Stopped: The geo-replication session has stopped, but has not been deleted.
Changelog Crawl: The changelog translator has produced the changelog and that is
being consumed by gsyncd daemon to sync data.
Hybrid Crawl: The gsyncd daemon is crawling the glusterFS file system and
generating pseudo changelog to sync data.
History Crawl: The gsyncd daemon consumes the history changelogs produced by
the changelog translator to sync data.
249
Administration Guide
Entry: The number of pending entry (CREATE, MKDIR, RENAME, UNLINK etc) operations
per session.
Failures: The number of failures. If the failure count is more than zero, view the log files for
errors in the Master bricks.
Checkpoint Time : Displays the date and time of the checkpoint, if set. Otherwise, it
displays as N/A.
For example:
To delete a setting for a geo-replication config option, prefix the option with ! (exclamation mark). For
example, to reset log-level to the default value:
WARNING
You must ensure to perform these configuration changes when all the peers in
cluster are in Connected (online) state. If you change the configuration when any
of the peer is down, the geo-replication cluster would be in inconsistent state when
the node comes back online.
Configurable Options
250
CHAPTER 10. MANAGING GEO-REPLICATION
The following table provides an overview of the configurable options for a geo-replication setting:
Option Description
changelog-log-level LOGFILELEVEL The log level for the changelog. The default log level
is set to INFO.
use-tarssh [true | false] The use-tarssh command allows tar over Secure
Shell protocol. Use this option to handle workloads
of files that have not undergone edits.
251
Administration Guide
Option Description
checkpoint [LABEL|now] Sets a checkpoint with the given option LABEL. If the
option is set as now, then the current time will be
used as the label.
sync-acls [true | false] Syncs acls to the Slave cluster. By default, this
option is enabled.
NOTE
NOTE
use-meta-volume [true | false] Set this option to enable , to use meta volume in
Geo-replicaiton. By default, this option is disabled.
NOTE
252
CHAPTER 10. MANAGING GEO-REPLICATION
Option Description
Red Hat Gluster Storage provides the ability to set geo-replication checkpoints. By setting a
checkpoint, synchronization information is available on whether the data that was on the master at
that point in time has been replicated to the slaves.
The label for a checkpoint can be set as the current time using now, or a particular label can be
specified, as shown below:
To display the status of a checkpoint for a geo-replication session, use the following command:
For example, to delete the checkpoint set between Volume1 and example.com::slave-
vol:
253
Administration Guide
For example:
NOTE
if the geo-replication session between the master and slave is not active.
For example:
Using force will stop the geo-replication session between the master and slave even if any
node that is a part of the volume is offline. If it is unable to stop the geo-replication session on
any particular node, the command will still stop the geo-replication sessions on as many nodes
as it can. Using force will also stop inactive geo-replication sessions.
IMPORTANT
You must first stop a geo-replication session before it can be deleted. For more
information, see Section 10.4.5, “Stopping a Geo-replication Session”.
254
CHAPTER 10. MANAGING GEO-REPLICATION
reset-sync-time: The geo-replication delete command retains the information about the last
synchronized time. Due to this, if the same geo-replication session is recreated, then the
synchronization will continue from the time where it was left before deleting the session. For the geo-
replication session to not maintain any details about the deleted session, use the reset-sync-time
option with the delete command. Now, when the session is recreated, it starts synchronization from
the beginning just like a new session.
For example:
NOTE
if the geo-replication session between the master and slave is still active.
IMPORTANT
The SSH keys will not removed from the master and slave nodes when the geo-
replication session is deleted. You can manually remove the pem files which contain the
SSH keys from the /var/lib/glusterd/geo-replication/ directory.
1. Run the following command on the master node where passwordless SSH connection is
configured, in order to create a common pem pub file.
2. Create the geo-replication session using the following command. The push-pem and force
options are required to perform the necessary pem-file setup on the slave nodes.
255
Administration Guide
For example:
NOTE
There must be passwordless SSH access between the node from which this
command is run, and the slave host specified in the above command. This
command performs the slave verification, which includes checking for a valid
slave URL, valid slave volume, and available space on the slave.
3. After successfully setting up the shared storage volume, when a new node is added to the
cluster, the shared storage is not mounted automatically on this node. Neither is the
/etc/fstab entry added for the shared storage on this node. To make use of shared storage
on this node, execute the following commands:
For more information on setting up shared storage volume, see Section 11.10, “Setting up
Shared Storage Volume”.
For example:
For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-
Volume”.
5. If a node is added at slave, stop the geo-replication session using the following command:
6. Start the geo-replication session between the slave and master forcefully, using the following
command:
256
CHAPTER 10. MANAGING GEO-REPLICATION
7. Verify the status of the created session, using the following command:
For more information on installing Cron and configuring Cron jobs, see Automating System Tasks in
the Red Hat Enterprise Linux 7 System Administrator's Guide.
The script provided to schedule the geo-replication session, performs the following:
For example,
257
Administration Guide
For example, to run geo-replication daily at 20:30 hours, run the following:
Run the following commands on the slave machine to promote it to be the master:
For example
You can now configure applications to use the slave volume for I/O operations.
10.7.2. Failback: Resuming Master and Slave back to their Original State
When the original master is back online, you can perform the following procedure on the original slave
so that it synchronizes the differences back to the original master:
1. Stop the existing geo-rep session from original master to orginal slave using the following
command:
For example,
258
CHAPTER 10. MANAGING GEO-REPLICATION
2. Create a new geo-replication session with the original slave as the new master, and the original
master as the new slave with force option. Detailed information on creating geo-replication
session is available at: .
3. Start the special synchronization mode to speed up the recovery of data from slave. This
option adds capability to geo-replication to ignore the files created before enabling indexing
option. With this option, geo-replication will synchronize only those files which are created
after making Slave volume as Master volume.
For example,
For example,
5. Stop the I/O operations on the original slave and set the checkpoint. By setting a checkpoint,
synchronization information is available on whether the data that was on the master at that
point in time has been replicated to the slaves.
For example,
259
Administration Guide
6. Checkpoint completion ensures that the data from the original slave is restored back to the
original master. But since the IOs were stopped at slave before checkpoint was set, we need to
touch the slave mount for checkpoint to be completed
# touch orginial_slave_mount
For example,
# touch /mnt/gluster/slavevol
# gluster volume geo-replication slave-vol master.com::Volume1
status detail
7. After the checkpoint is complete, stop and delete the current geo-replication session between
the original slave and original master
For example,
8. Reset the options that were set for promoting the slave volume as the master volume by
running the following commands:
For example,
9. Resume the original roles by starting the geo-rep session from the original master using the
following command:
260
CHAPTER 10. MANAGING GEO-REPLICATION
For information on prerequisites, creating, and restoring snapshots of geo-replicated volume, see
Chapter 8, Managing Snapshots. Creation of a snapshot when geo-replication session is live is not
supported and creation of snapshot in this scenario will display the following error:
You must ensure to pause the geo-replication session before creating snapshot and resume geo-
replication session after creating the snapshot. Information on restoring geo-replicated volume is also
available in the Managing Snapshots chapter.
1. Verify that your environment matches the minimum system requirements listed in
Section 10.3.3, “Prerequisites”.
3. Configure the environment and create a geo-replication session between master-vol and
interimmaster-vol.
1. Create a common pem pub file, run the following command on the master node where the
passwordless SSH connection is configured:
2. Create the geo-replication session using the following command. The push-pem option is
needed to perform the necessary pem-file setup on the interimmaster nodes.
261
Administration Guide
3. Verify the status of the created session by running the following command:
For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-
Volume”.
This command will start distributed geo-replication on all the nodes that are part of the master
volume. If a node that is part of the master volume is down, the command will still be
successful. In a replica pair, the geo-replication session will be active on any of the replica
nodes, but remain passive on the others. After executing the command, it may take a few
minutes for the session to initialize and become stable.
1. Create a common pem pub file by running the following command on the interimmaster
master node where the passwordless SSH connection is configured:
2. On interimmaster node, create the geo-replication session using the following command.
The push-pem option is needed to perform the necessary pem-file setup on the slave
nodes.
3. Verify the status of the created session by running the following command:
262
CHAPTER 10. MANAGING GEO-REPLICATION
For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-
Volume”.
1. Stop geo-replication between the master and slave, using the following command:
Performance Tuning
When the following option is set, it has been observed that there is an increase in geo-replication
performance. On the slave volume, run the following command:
263
Administration Guide
1. Create a geo-replication session locally within the LAN. For information on creating a geo-
replication session, see Section 10.3.4.1, “Setting Up your Environment for Geo-replication
Session”.
IMPORTANT
You must remember the order in which the bricks/disks are specified when
creating the slave volume. This information is required later for configuring the
remote geo-replication session over the WAN.
2. Ensure that the initial data on the master is synced to the slave volume. You can verify the
status of the synchronization by using the status command, as shown in Section 10.4.3,
“Displaying Geo-replication Status Information”.
For information on stopping and deleting the the geo-replication session, see Section 10.4.5,
“Stopping a Geo-replication Session” and Section 10.4.6, “Deleting a Geo-replication Session” .
IMPORTANT
For information on stopping and deleting the volume, see Section 11.11, “Stopping Volumes”
and Section 11.12, “Deleting Volumes”.
5. Remove the disks from the slave nodes, and physically transport them to the remote location.
Make sure to remember the order in which the disks were specified in the volume.
6. At the remote location, attach the disks and mount them on the slave nodes. Make sure that
the file system or logical volume manager is recognized, and that the data is accessible after
mounting it.
7. Configure a trusted storage pool for the slave using the peer probe command.
For information on configuring a trusted storage pool, see Chapter 4, Adding Servers to the
Trusted Storage Pool.
264
CHAPTER 10. MANAGING GEO-REPLICATION
8. Delete the glusterFS-related attributes on the bricks. This should be done before creating the
volume. You can remove the glusterFS-related attributes by running the following command:
Run the following command to ensure that there are no xattrs still set on the brick:
# getfattr -d -m . ABSOLUTE_PATH_TO_BRICK
9. After creating the trusted storage pool, create the Red Hat Gluster Storage volume with the
same configuration that it had when it was on the LAN. For information on creating volumes,
see Chapter 5, Setting Up Storage Volumes.
IMPORTANT
Make sure to specify the bricks in same order as they were previously when on
the LAN. A mismatch in the specification of the brick order may lead to data loss
or corruption.
10. Start and mount the volume, and check if the data is intact and accessible.
For information on starting and mounting volumes, see Section 5.11, “Starting Volumes” and
Chapter 6, Creating Access to Volumes.
11. Configure the environment and create a geo-replication session from the master to this
remote slave.
For information on configuring the environment and creating a geo-replication session, see
Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session” .
12. Start the geo-replication session between the master and the remote slave.
For information on starting the geo-replication session, see Section 10.4, “Starting Geo-
replication”.
13. Use the status command to verify the status of the session, and check if all the nodes in the
session are stable.
For information on the status, see Section 10.4.3, “Displaying Geo-replication Status
Information”.
265
Administration Guide
The rollover-time option sets the rate at which the change log is consumed. The default rollover
time is 60 seconds, but it can be configured to a faster rate. A recommended rollover-time for geo-
replication is 10-15 seconds. To change the rollover-time option, use following the command:
The fsync-interval option determines the frequency that updates to the change log are written to
disk. The default interval is 0, which means that updates to the change log are written synchronously
as they occur, and this may negatively impact performance in a geo-replication environment.
Configuring fsync-interval to a non-zero value will write updates to disk asynchronously at the
specified interval. To change the fsync-interval option, use following the command:
The support of explicit trigger of sync is supported only for directories and regular files.
Situation
The geo-replication status is displayed as Stable, but the data has not been completely synchronized.
Solution
A full synchronization of the data can be performed by erasing the index and restarting geo-
replication. After restarting geo-replication, it will begin a synchronization of the data using
checksums. This may be a long and resource intensive process on large data sets. If the issue persists,
contact Red Hat Support.
For more information about erasing the index, see Section 11.1, “Configuring Volume Options” .
Solution
Geo-replication requires rsync v3.0.0 or higher on the host and the remote machines. Verify if you
have installed the required version of rsync.
266
CHAPTER 10. MANAGING GEO-REPLICATION
Situation
The geo-replication status is often displayed as Faulty, with a backtrace similar to the following:
Solution
This usually indicates that RPC communication between the master gsyncd module and slave gsyncd
module is broken. Make sure that the following pre-requisites are met:
Passwordless SSH is set up properly between the host and remote machines.
FUSE is installed on the machines. The geo-replication module mounts Red Hat Gluster
Storage volumes using FUSE to sync data.
Situation
In a cascading environment, the intermediate master is in a faulty state, and messages similar to the
following are in the log:
Solution
In a cascading configuration, an intermediate master is loyal to its original primary master. The above
log message indicates that the geo-replication module has detected that the primary master has
changed. If this change was deliberate, delete the volume-id configuration option in the session that
was initiated from the intermediate master.
Solution
The steps to configure a SSH connection for geo-replication have been updated. Use the steps as
described in Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session”
267
Administration Guide
NOTE
Volume options can be configured while the trusted storage pool is online.
The current settings for a volume can be viewed using the following command:
The parameters defined in the file can then be applied to a volume as a group, rather than setting one
parameter at a time.
# touch /var/lib/glusterd/groups/filename
2. Add the parameters and values that you want to set on the volume to the created file as key-
value pairs, placing each parameter on a new line:
domain1.key1=value1
domain1.key2=value2
domain2.key3=value3
For example,
268
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
changelog.changelog=on
client.event-threads=6
cluster.brick-multiplex=on
For example,
NOTE
The configuration file created should be placed in all the hosts of the trusted storage
pool under /var/lib/glusterd/groups/. This can be achieved with the help of gdeploy
configuration file.
NOTE
The default values are subject to change, and may not be the same for all versions of
Red Hat Gluster Storage.
269
Administration Guide
NOTE
Using auth.allow and auth.reject options, you can control access of only glusterFS
FUSE-based clients. Use nfs.rpc-auth-* options for NFS access control.
270
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
Brick compatibility is
determined at volume
start, and depends on
volume options shared
between bricks. When
multiplexing is enabled,
Red Hat recommends
restarting volumes
whenever volume
configuration is
changed in order to
maintain the
compatibility of the
bricks grouped under a
single process.
271
Administration Guide
NOTE
272
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
IMPORTANT
cluster.op-version Allows you to set the 30708 | 30712 | 31001 | Default value depends
operating version of the 31101 on Red Hat Gluster
cluster. The op-version Storage version first
number cannot be installed. For Red Hat
downgraded and is set Gluster Storage 3.3 the
for all volumes in the value is set to 31101 for
cluster. The op-version a new deployment.
is not listed as part of
gluster volume info
command output.
273
Administration Guide
274
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
275
Administration Guide
276
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
diagnostics.brick-log- The length of time for 30 - 300 seconds (30 120 seconds
flush-timeout which the log messages and 300 included)
are buffered, before
being flushed to the
logging infrastructure
(gluster or syslog files)
on the bricks.
277
Administration Guide
diagnostics.client-log- The length of time for 30 - 300 seconds (30 120 seconds
flush-timeout which the log messages and 300 included)
are buffered, before
being flushed to the
logging infrastructure
(gluster or syslog files)
on the clients.
278
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
279
Administration Guide
280
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
NOTE
The value set for nfs.enable-ino32 option is global and applies to all the volumes in the
Red Hat Gluster Storage trusted storage pool.
281
Administration Guide
282
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
283
Administration Guide
284
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
285
Administration Guide
286
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
287
Administration Guide
288
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
289
Administration Guide
IMPORTANT
server.anonuid Value of the UID used 0 - 4294967295 65534 (this UID is also
for the anonymous user known as nfsnobody)
when root-squash is
enabled. When root-
squash is enabled, all
the requests received
from the root UID (that
is 0) are changed to
have the UID of the
anonymous user.
server.anongid Value of the GID used 0 - 4294967295 65534 (this UID is also
for the anonymous user known as nfsnobody)
when root-squash is
enabled. When root-
squash is enabled, all
the requests received
from the root GID (that
is 0) are changed to
have the GID of the
anonymous user.
290
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
291
Administration Guide
storage.owner-uid Sets the UID for the Any integer greater The UID of the bricks
bricks of the volume. than or equal to -1. are not changed. This is
This option may be denoted by -1.
required when some of
the applications need
the brick to have a
specific UID to function
correctly. Example: For
QEMU integration the
UID/GID must be
qemu:qemu, that is,
107:107 (107 is the UID
and GID of qemu).
storage.owner-gid Sets the GID for the Any integer greater The GID of the bricks
bricks of the volume. than or equal to -1. are not changed. This is
This option may be denoted by -1.
required when some of
the applications need
the brick to have a
specific GID to function
correctly. Example: For
QEMU integration the
UID/GID must be
qemu:qemu, that is,
107:107 (107 is the UID
and GID of qemu).
1. Unmount the volume on all the clients using the following command:
292
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
# umount mount-point
3. Change the transport type. For example, to enable both tcp and rdma execute the followimg
command:
4. Mount the volume on all the clients. For example, to mount using rdma transport, use the
following command:
When expanding replicated or distributed replicated volumes, the number of bricks being added must
be a multiple of the replica count. This also applies to arbitrated volumes. For example, to expand a
distributed replicated volume with a replica count of 3, you need to add bricks in multiples of 3 (such as
6, 9, 12, etc.).
You can also convert a replica 2 volume into an arbitrated replica 3 volume by following the
instructions in Section 5.8.5, “Converting to an arbitrated volume” .
IMPORTANT
Expanding a Volume
1. From any server in the trusted storage pool, use the following command to probe the server on
which you want to add a new brick:
For example:
293
Administration Guide
For example:
4. Rebalance the volume to ensure that files will be distributed to the new brick. Use the
rebalance command as described in Section 11.9, “Rebalancing Volumes” .
Expanding a cold tier volume is same as a non-tiered volume. If you are reusing the brick, ensure to
perform the steps listed in “Section 5.4.3, “ Reusing a Brick from a Deleted Volume ” ” section.
1. Detach the tier by performing the steps listed in Section 17.7, “Detaching a Tier from a Volume”
2. From any server in the trusted storage pool, use the following command to probe the server on
which you want to add a new brick :
294
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
For example:
For example:
4. Rebalance the volume to ensure that files will be distributed to the new brick. Use the
rebalance command as described in Section 11.9, “Rebalancing Volumes” .
5. Reattach the tier to the volume with both old and new (expanded) bricks:
IMPORTANT
If you are reusing the brick, be sure to clearly wipe the existing data before attaching it to the
tiered volume.
You can expand a hot tier volume by attaching and adding bricks for the hot tier.
1. Detach the tier by performing the steps listed in Section 17.7, “Detaching a Tier from a Volume”
2. Reattach the tier to the volume with both old and new (expanded) bricks:
For example,
295
Administration Guide
IMPORTANT
If you are reusing the brick, be sure to clearly wipe the existing data before attaching it to the
tiered volume.
NOTE
1. From any server in the trusted storage pool, use the following command to probe the server on
which you want to add new bricks:
For example:
For example:
296
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
For example:
4. Rebalance the volume to ensure that the files will be distributed to the new brick. Use the
rebalance command as described in Section 11.9, “Rebalancing Volumes” .
When shrinking distributed replicated volumes, the number of bricks being removed must be a multiple
of the replica count. For example, to shrink a distributed replicated volume with a replica count of 3,
you need to remove bricks in multiples of 3 (such as 6, 9, 12, etc.). In addition, the bricks you are
removing must be from the same sub-volume (the same replica set). In a non-replicated volume, all
bricks must be available in order to migrate data and perform the remove brick operation. In a
replicated or arbitrated volume, at least one of the data bricks in the replica set must be available.
The guidelines are identical when removing a distribution set from a distributed replicated volume with
arbiter bricks. If you want to reduce the replica count of an arbitrated distributed replicated volume to
replica 2, you must remove only the arbiter bricks. If you want to reduce a volume from arbitrated
distributed replicated to distributed only, remove the arbiter brick and one replica brick from each
replica subvolume.
Shrinking a Volume
297
Administration Guide
For example:
NOTE
If the remove-brick command is run with force or without any option, the
data on the brick that you are removing will no longer be accessible at the
glusterFS mount point. When using the start option, the data is migrated to
other bricks, and on a successful commit the removed brick's information is
deleted from the volume configuration. Data can still be accessed directly on the
brick.
2. You can view the status of the remove brick operation using the following command:
For example:
3. When the data migration shown in the previous status command is complete, run the
following command to commit the brick removal:
For example,
4. After the brick removal, you can check the volume information using the following command:
298
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
For example:
NOTE
If the remove-brick command is run with force or without any option, the
data on the brick that you are removing will no longer be accessible at the
glusterFS mount point. When using the start option, the data is migrated to
other bricks, and on a successful commit the removed brick's information is
deleted from the volume configuration. Data can still be accessed directly on the
brick.
2. Use geo-replication config checkpoint to ensure that all the data in that brick is synced to
the slave.
2. Verify the checkpoint completion for the geo-replication session using the following
command:
299
Administration Guide
3. You can view the status of the remove brick operation using the following command:
For example:
4. Stop the geo-replication session between the master and the slave:
5. When the data migration shown in the previous status command is complete, run the
following command to commit the brick removal:
For example,
6. After the brick removal, you can check the volume information using the following command:
1. Detach the tier by performing the steps listed in Section 17.7, “Detaching a Tier from a Volume”
For example:
300
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
NOTE
If the remove-brick command is run with force or without any option, the
data on the brick that you are removing will no longer be accessible at the
glusterFS mount point. When using the start option, the data is migrated to
other bricks, and on a successful commit the removed brick's information is
deleted from the volume configuration. Data can still be accessed directly on the
brick.
3. You can view the status of the remove brick operation using the following command:
For example:
4. When the data migration shown in the previous status command is complete, run the
following command to commit the brick removal:
For example,
5. Rerun the attach-tier command only with the required set of bricks:
For example,
IMPORTANT
301
Administration Guide
You must first decide on which bricks should be part of the hot tiered volume and which bricks should
be removed from the hot tier volume.
1. Detach the tier by performing the steps listed in Section 17.7, “Detaching a Tier from a Volume”
2. Rerun the attach-tier command only with the required set of bricks:
IMPORTANT
IMPORTANT
A remove-brick operation that is in progress can be stopped by using the stop command.
NOTE
Files that were already migrated during remove-brick operation will not be migrated
back to the same brick when the operation is stopped.
For example:
302
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
NOTE
To replace the entire subvolume with new bricks on a Distribute-replicate volume, follow these steps:
303
Administration Guide
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
NOTE
304
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
5. Verify the content on the brick after committing the remove-brick operation on the
volume. If there are any files leftover, copy it through FUSE or NFS mount.
1. Verify if there are any pending files on the bricks of the subvolume.
Along with files, all the application-specific extended attributes must be copied.
glusterFS also uses extended attributes to store its internal data. The extended
attributes used by glusterFS are of the form trusted.glusterfs.*,
trusted.afr.*, and trusted.gfid. Any extended attributes other than ones
listed above must also be copied.
Syntax:
If the mount point is /mnt/glusterfs and brick path is /rhgs/brick1, then the
script must be run as:
#!/bin/bash
MOUNT=$1
BRICK=$2
305
Administration Guide
2. To identify a list of files that are in a split-brain state, execute the command:
3. If there are any files listed in the output of the above command, compare the files
across the bricks in a replica set, delete the bad files from the brick and retain the
correct copy of the file. Manual intervention by the System Administrator would be
required to choose the correct copy of file.
Procedure to replace an old brick with a new brick on a Replicate or Distribute-replicate volume:
1. Ensure that the new brick (server5:/rhgs/brick1) that replaces the old brick
(server0:/rhgs/brick1) is empty. Ensure that all the bricks are online. The brick that must
be replaced can be in an offline state.
306
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
4. Data on the newly added brick would automatically be healed. It might take time depending
upon the amount of data to be healed. It is recommended to check heal information after
replacing a brick to make sure all the data has been healed before replacing/removing any
other brick.
For example:
Brick server1:/rhgs/brick1
Status: Connected
Number of entries: 0
Brick server2:/rhgs/brick1
Status: Connected
Number of entries: 0
Brick server3:/rhgs/brick1
Status: Connected
Number of entries: 0
The value of Number of entries field will be displayed as zero if the heal is complete.
IMPORTANT
In case of a Distribute volume type, replacing a brick using this procedure will result in
data loss.
307
Administration Guide
---------------------------------------------------------
Brick server5:/rhgs/brick1 49156 Y 5731
NOTE
All the replace-brick command options except the commit force option are
deprecated.
Procedure to replace an old brick with a new brick on a Dispersed or Distributed-dispersed volume:
1. Ensure that the new brick that replaces the old brick is empty. The brick that must be replaced
can be in an offline state but all other bricks must be online.
For example:
The new brick you are adding could be from the same server or you can add a new server and
then a new brick.
308
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
4. Data on the newly added brick would automatically be healed. It might take time depending
upon the amount of data to be healed. It is recommended to check heal information after
replacing a brick to make sure all the data has been healed before replacing/removing any
other brick.
For example:
Brick server1:/rhgs/brick2new
Status: Connected
Number of entries: 0
Brick server2:/rhgs/brick3
Status: Connected
Number of entries: 0
Brick server2:/rhgs/brick4
Status: Connected
Number of entries: 0
Brick server3:/rhgs/brick5
Status: Connected
Number of entries: 0
Brick server3:/rhgs/brick6
Status: Connected
Number of entries: 0
The value of Number of entries field will be displayed as zero if the heal is complete.
309
Administration Guide
reset-brick lets you replace a brick with another brick of the same location and UUID. For example,
if you initially configured bricks so that they were identified with a hostname, but you want to use that
hostname somewhere else, you can use reset-brick to stop the brick, reconfigure it so that it is
identified by an IP address instead of the hostname, and return the reconfigured brick to the cluster.
To reconfigure a brick (replace a brick with another brick of the same hostname, path, and UUID),
perform the following steps:
1. Ensure that the quorum minimum will still be met when the brick that you want to reset is
taken offline.
2. If possible, Red Hat recommends stopping I/O, and verifying that no heal operations are
pending on the volume.
3. Run the following command to kill the brick that you want to reset.
5. Check that the volume's Volume ID displayed by gluster volume info matches the
volume-id (if any) of the offline brick.
For example, in the following dispersed volume, the Volume ID and the volume-id are both
ab8a981a-a6d9-42f2-b8a5-0b28fe2c4548.
# cat /var/lib/glusterd/vols/vol/vol.myhost.brick-gluster-vol-1.vol
| grep volume-id
option volume-id ab8a981a-a6d9-42f2-b8a5-0b28fe2c4548
6. Bring the reconfigured brick back online. There are two options for this:
If your brick did not have a volume-id in the previous step, run:
310
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
If your brick's volume-id matches your volume's identifier, Red Hat recommends adding
the force keyword to ensure that the operation succeeds.
IMPORTANT
Ensure that the new peer has the exact disk capacity as that of the one it is replacing.
For example, if the peer in the cluster has two 100GB drives, then the new peer must
have the same disk capacity and number of drives.
In the following example the original machine which has had an irrecoverable failure is
server0.example.com and the replacement machine is server5.example.com. The brick with an
unrecoverable failure is server0.example.com:/rhgs/brick1 and the replacement brick is
server5.example.com:/rhgs/brick1.
2. Probe the new peer from one of the existing peers to bring it into the cluster.
2. Create geo-replication session again with force option to distribute the keys from new
nodes to Slave nodes.
3. After successfully setting up the shared storage volume, when a new node is replaced in
the cluster, the shared storage is not mounted automatically on this node. Neither is the
/etc/fstab entry added for the shared storage on this node. To make use of shared
311
Administration Guide
For more information on setting up shared storage volume, see Section 11.10, “Setting up
Shared Storage Volume”.
6. Create the required brick path in server5.example.com.For example, if /rhs/brick is the XFS
mount point in server5.example.com, then create a brick directory in that path.
# mkdir /rhgs/brick1
312
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
9. Initiate self-heal on the volume. The status of the heal process can be seen by executing the
command:
10. The status of the heal process can be seen by executing the command:
12. Ensure that after the self-heal completes, the extended attributes are set to zero on the other
bricks in the replica.
313
Administration Guide
/var/lib/glusterd/glusterd/info file.
In the following example, the host with the FQDN as server0.example.com was irrecoverable and must
to be replaced with a host, having the same FQDN. The following steps have to be performed on the
new host.
3. Retrieve the UUID of the failed host (server0.example.com) from another of the Red Hat
Gluster Storage Trusted Storage Pool by executing the following command:
Hostname: server1.example.com
Uuid: 1d9677dc-6159-405e-9319-ad85ec030880
State: Peer in Cluster (Connected)
Hostname: server0.example.com
Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
State: Peer Rejected (Connected)
4. Edit the glusterd.info file in the new host and include the UUID of the host you retrieved in
the previous step.
# cat /var/lib/glusterd/glusterd.info
UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
operating-version=30703
NOTE
The operating version of this node must be same as in other nodes of the
trusted storage pool.
5. Select any host (say for example, server1.example.com) in the Red Hat Gluster Storage Trusted
Storage Pool and retrieve its UUID from the glusterd.info file.
6. Gather the peer information files from the host (server1.example.com) in the previous step.
Execute the following command in that host (server1.example.com) of the cluster.
314
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
# cp -a /var/lib/glusterd/peers /tmp/
7. Remove the peer file corresponding to the failed host (server0.example.com) from the
/tmp/peers directory.
# rm /tmp/peers/b5ab2ec3-5411-45fa-a30f-43bd04caf96b
Note that the UUID corresponds to the UUID of the failed host (server0.example.com)
retrieved in Step 3.
8. Archive all the files and copy those to the failed host(server0.example.com).
10. Copy the extracted content to the /var/lib/glusterd/peers directory. Execute the
following command in the newly added host with the same name (server0.example.com) and IP
Address.
11. Select any other host in the cluster other than the node (server1.example.com) selected in
step 5. Copy the peer file corresponding to the UUID of the host retrieved in Step 4 to the new
host (server0.example.com) by executing the following command:
# scp /var/lib/glusterd/peers/<UUID-retrieved-from-step4>
root@Example1:/var/lib/glusterd/peers/
12. Retrieve the brick directory information, by executing the following command in any host in
the cluster.
315
Administration Guide
In the above example, the brick path in server0.example.com is, /rhgs/brick1. If the brick
path does not exist in server0.example.com, perform steps a, b, and c.
mkdir /rhgs/brick1
2. Retrieve the volume ID from the existing brick of another host by executing the following
command on any host that contains the bricks for the volume.
3. Set this volume ID on the brick created in the newly added host and execute the following
command on the newly added host (server0.example.com).
For Example:
# setfattr -n trusted.glusterfs.volume-id -v
0x8f16258c88a0498fbd53368706af7496 /rhs/brick2/drv2
Data recovery is possible only if the volume type is replicate or distribute-replicate. If the
volume type is plain distribute, you can skip steps 12 and 13.
14. Perform the following operations to change the Automatic File Replication extended
attributes so that the heal process happens from the other brick
(server1.example.com:/rhgs/brick1) in the replica pair to the new brick
(server0.example.com:/rhgs/brick1). Note that /mnt/r2 is the FUSE mount path.
1. Create a new directory on the mount point and ensure that a directory with such a name is
not already present.
# mkdir /mnt/r2/<name-of-nonexistent-dir>
316
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
# rmdir /mnt/r2/<name-of-nonexistent-dir>
# setfattr -n trusted.non-existent-key -v abc /mnt/r2
# setfattr -x trusted.non-existent-key /mnt/r2
3. Ensure that the extended attributes on the other bricks in the replica (in this example,
trusted.afr.vol-client-0) is not set to zero.
NOTE
You must ensure to perform steps 12, 13, and 14 for all the volumes having
bricks from server0.example.com.
17. You can view the gluster volume self-heal status by executing the following command:
2. Create geo-replication session again with force option to distribute the keys from new
nodes to Slave nodes.
3. After successfully setting up the shared storage volume, when a new node is replaced in
the cluster, the shared storage is not mounted automatically on this node. Neither is the
/etc/fstab entry added for the shared storage on this node. To make use of shared
storage on this node, execute the following commands:
317
Administration Guide
For more information on setting up shared storage volume, see Section 11.10, “Setting up
Shared Storage Volume”.
Replacing a host with the same Hostname in a two-node Red Hat Gluster Storage Trusted Storage
Pool
If there are only 2 hosts in the Red Hat Gluster Storage Trusted Storage Pool where the host
server0.example.com must be replaced, perform the following steps:
3. Retrieve the UUID of the failed host (server0.example.com) from another peer in the Red Hat
Gluster Storage Trusted Storage Pool by executing the following command:
Hostname: server0.example.com
Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
State: Peer Rejected (Connected)
4. Edit the glusterd.info file in the new host (server0.example.com) and include the UUID of
the host you retrieved in the previous step.
# cat /var/lib/glusterd/glusterd.info
UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
operating-version=30703
318
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
NOTE
The operating version of this node must be same as in other nodes of the
trusted storage pool.
For example,
# gluster system:: uuid get
UUID: 1d9677dc-6159-405e-9319-ad85ec030880
# touch /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880
UUID=<uuid-of-other-node>
state=3
hostname=<hostname>
NOTE
For example:
319
Administration Guide
When run without the force option, the rebalance command attempts to balance the space utilized
across nodes. Files whose migration would cause the target node to have less available space than the
source node are skipped. This results in linkto files being retained, which may cause slower access
when a large number of linkto files are present.
Enhancements made to the file rename and rebalance operations in Red Hat Gluster Storage 2.1
update 5 requires that all the clients connected to a cluster operate with the same or later versions. If
the clients operate on older versions, and a rebalance operation is performed, the following warning
message is displayed and the rebalance operation will not be executed.
Red Hat strongly recommends you to disconnect all the older clients before executing the rebalance
command to avoid a potential data loss scenario.
WARNING
The Rebalance command can be executed with the force option even when the
older clients are connected to the cluster. However, this could lead to a data loss
situation.
A rebalance operation with force, balances the data based on the layout, and hence optimizes or
does away with the link files, but may lead to an imbalanced storage space used across bricks. This
option is to be used only when there are a large number of link files in the system.
To rebalance a volume forcefully, use the following command on any of the servers:
For example:
By default, the rebalance throttling is started in the normal mode. Configure the throttling modes to
adjust the rate at which the files must be migrated
320
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
For example:
For example:
This displays the estimated time left for the rebalance to complete on all nodes. The estimated time to
complete is displayed only after the rebalance operation has been running for 10 minutes. In cases
where the remaining time is extremely large, the estimated time to completion is displayed as >2
months and the user is advised to check again later.
The time taken to complete a rebalance operation depends on the number of files estimated to be on
the bricks and the rate at which files are being processed by the rebalance process. This value is
recalculated every time the rebalance status command is executed and becomes more accurate the
longer rebalance has been running, and for large data sets. The calculation assumes that a file system
partition contains a single brick.
The rebalance status is shown as completed when the rebalance is complete. For example:
321
Administration Guide
For example:
cluster.enable-shared-storage
enable
When the volume set option is enabled, a gluster volume named gluster_shared_storage
is created in the cluster, and is mounted at /var/run/gluster/shared_storage on all the
nodes in the cluster.
322
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
NOTE
This option cannot be enabled if there is only one node present in the
cluster, or if only one node is online in the cluster.
Before enabling this feature make sure that there is no volume named
gluster_shared_storage in the cluster. This volume name is reserved
for internal use only
After successfully setting up the shared storage volume, when a new node is added to the
cluster, the shared storage is not mounted automatically on this node. Neither is the
/etc/fstab entry added for the shared storage on this node. To make use of shared storage
on this node, execute the following commands:
disable
When the volume set option is disabled, the gluster_shared_storage volume is
unmounted on all the nodes in the cluster, and then the volume is deleted. The mount entry
from /etc/fstab as part of disable is also removed.
For example:
IMPORTANT
After creating a cluster excute the following command on all nodes present in the
cluster:
323
Administration Guide
IMPORTANT
Volumes must be unmounted and stopped before you can delete them. Ensure that you
also remove entries relating to this volume from the /etc/fstab file after the volume
has been deleted.
In Red Hat Gluster Storage, split-brain is a term applicable to Red Hat Gluster Storage volumes in a
replicate configuration. A file is said to be in split-brain when the copies of the same file in different
bricks that constitute the replica-pair have mismatching data and/or meta-data contents such that
they are conflicting each other and automatic healing is not possible. In this scenario, you can decide
which is the correct file (source) and which is the one that require healing (sink) by inspecting at the
mismatching files from the backend bricks.
The AFR translator in glusterFS makes use of extended attributes to keep track of the operations on a
file. These attributes determine which brick is the source and which brick is the sink for a file that
require healing. If the files are clean, the extended attributes are all zeroes indicating that no heal is
necessary. When a heal is required, they are marked in such a way that there is a distinguishable source
and sink and the heal can happen automatically. But, when a split-brain occurs, these extended
attributes are marked in such a way that both bricks mark themselves as sources, making automatic
healing impossible.
When a split-brain occurs, applications cannot perform certain operations like read and write on the
file. Accessing the files results in the application receiving an Input/Output Error.
324
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
The three types of split-brains that occur in Red Hat Gluster Storage are:
Data split-brain: Contents of the file under split-brain are different in different replica pairs and
automatic healing is not possible.
Metadata split-brain : The metadata of the files (example, user defined extended attribute) are
different and automatic healing is not possible.
Entry split-brain: This happens when a file have different gfids on each of the replica pair.
The only way to resolve split-brains is by manually inspecting the file contents from the backend and
deciding which is the true copy (source ) and modifying the appropriate extended attributes such that
healing can happen automatically.
The quorum configuration in a trusted storage pool determines the number of server failures that the
trusted storage pool can sustain. If an additional failure occurs, the trusted storage pool will become
unavailable. If too many server failures occur, or if there is a problem with communication between the
trusted storage pool nodes, it is essential that the trusted storage pool be taken offline to prevent data
loss.
After configuring the quorum ratio at the trusted storage pool level, you must enable the quorum on a
particular volume by setting cluster.server-quorum-type volume option as server. For more
information on this volume option, see Section 11.1, “Configuring Volume Options” .
Configuration of the quorum is necessary to prevent network partitions in the trusted storage pool.
Network Partition is a scenario where, a small set of nodes might be able to communicate together
across a functioning part of a network, but not be able to communicate with a different set of nodes in
another part of the network. This can cause undesirable situations, such as split-brain in a distributed
system. To prevent a split-brain situation, all the nodes in at least one of the partitions must stop
running to avoid inconsistencies.
This quorum is on the server-side, that is, the glusterd service. Whenever the glusterd service on a
machine observes that the quorum is not met, it brings down the bricks to prevent data split-brain.
When the network connections are brought back up and the quorum is restored, the bricks in the
volume are brought back up. When the quorum is not met for a volume, any commands that update the
volume configuration or peer addition or detach are not allowed. It is to be noted that both, the
glusterd service not running and the network connection between two machines being down are
treated equally.
You can configure the quorum percentage ratio for a trusted storage pool. If the percentage ratio of
the quorum is not met due to network outages, the bricks of the volume participating in the quorum in
those nodes are taken offline. By default, the quorum is met if the percentage of active nodes is more
than 50% of the total storage nodes. However, if the quorum ratio is manually configured, then the
quorum is met only if the percentage of active storage nodes of the total storage nodes is greater than
or equal to the set value.
325
Administration Guide
For example, to set the quorum to 51% of the trusted storage pool:
In this example, the quorum ratio setting of 51% means that more than half of the nodes in the trusted
storage pool must be online and have network connectivity between them at any given time. If a
network disconnect happens to the storage pool, then the bricks running on those nodes are stopped
to prevent further writes.
You must ensure to enable the quorum on a particular volume to participate in the server-side quorum
by running the following command:
IMPORTANT
For a two-node trusted storage pool, it is important to set the quorum ratio to be greater
than 50% so that two nodes separated from each other do not both believe they have a
quorum.
For a replicated volume with two nodes and one brick on each machine, if the server-side quorum is
enabled and one of the nodes goes offline, the other node will also be taken offline because of the
quorum configuration. As a result, the high availability provided by the replication is ineffective. To
prevent this situation, a dummy node can be added to the trusted storage pool which does not contain
any bricks. This ensures that even if one of the nodes which contains data goes offline, the other node
will remain online. Note that if the dummy node and one of the data nodes goes offline, the brick on
other node will be also be taken offline, and will result in data unavailability.
By default, when replication is configured, clients can modify files as long as at least one brick in the
replica group is available. If network partitioning occurs, different clients are only able to connect to
different bricks in a replica set, potentially allowing different clients to modify a single file
simultaneously.
For example, imagine a three-way replicated volume is accessed by two clients, C1 and C2, who both
want to modify the same file. If network partitioning occurs such that client C1 can only access brick
B1, and client C2 can only access brick B2, then both clients are able to modify the file independently,
creating split-brain conditions on the volume. The file becomes unusable, and manual intervention is
required to correct the issue.
Client-side quorum allows administrators to set a minimum number of bricks that a client must be able
to access in order to allow data in the volume to be modified. If client-side quorum is not met, files in
the replica set are treated as read-only. This is useful when three-way replication is configured.
Client-side quorum is configured on a per-volume basis, and applies to all replica sets in a volume. If
client-side quorum is not met for X of Y volume sets, only X volume sets are treated as read-only; the
remaining volume sets continue to allow data modification.
cluster.quorum-count
326
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
The minimum number of bricks that must be available in order for writes to be allowed. This is set
on a per-volume basis. Valid values are between 1 and the number of bricks in a replica set. This
option is used by the cluster.quorum-type option to determine write behavior.
cluster.quorum-type
Determines when the client is allowed to write to a volume. Valid values are fixed and auto.
If cluster.quorum-type is fixed, writes are allowed as long as the number of bricks available in
the replica set is greater than or equal to the value of the cluster.quorum-count option.
If cluster.quorum-type is auto, writes are allowed when at least 50% of the bricks in a replica
set are be available. In a replica set with an even number of bricks, if exactly 50% of the bricks are
available, the first brick in the replica set must be available in order for writes to continue.
In the above scenario, when the client-side quorum is not met for replica group A, only replica group
A becomes read-only. Replica groups B and C continue to allow data modifications.
327
Administration Guide
IMPORTANT
When you integrate Red Hat Gluster Storage with Red Hat Enterprise Virtualization or
Red Hat OpenStack, the client-side quorum is enabled when you run gluster volume
set VOLNAME group virt command. If on a two replica set up, if the first brick in the
replica pair is offline, virtual machines will be paused because quorum is not met and
writes are disallowed.
Enable the quorum on a particular volume to participate in the server-side quorum by running the
following command:
In this example, the quorum ratio setting of 51% means that more than half of the nodes in the trusted
storage pool must be online and have network connectivity between them at any given time. If a
network disconnect happens to the storage pool, then the bricks running on those nodes are stopped
to prevent further writes.
328
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
Set the quorum-typeoption to auto to allow writes to the file only if the percentage of active
replicate bricks is more than 50% of the total number of bricks that constitute that replica.
In this example, as there are only two bricks in the replica pair, the first brick must be up and running
to allow writes.
IMPORTANT
Atleast n/2 bricks need to be up for the quorum to be met. If the number of bricks (n) in
a replica set is an even number, it is mandatory that the n/2 count must consist of the
primary brick and it must be up and running. If n is an odd number, the n/2 count can
have any brick up and running, that is, the primary brick need not be up and running to
allow writes.
See Section 11.13.2.1, “ Recovering File Split-brain from the Mount Point” for information on
how to recover from data and meta-data split-brain from the mount point.
See Section 11.13.2.2, “Recovering File Split-brain from the gluster CLI” for information on how
to recover from data and meta-data split-brain using CLI
For information on resolving gfid/entry split-brain, see Chapter 26, Manually Recovering File Split-
brain .
1. You can use a set of getfattr and setfattr commands to detect the data and meta-data
split-brain status of a file and resolve split-brain from the mount point.
IMPORTANT
This process for split-brain resolution from mount will not work on NFS mounts
as it does not provide extended attributes support.
In this example, the test-volume volume has bricks brick0, brick1, brick2 and brick3.
329
Administration Guide
Brick3: test-host:/rhgs/brick2
Brick4: test-host:/rhgs/brick3
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
# tree -R /test/b?
/rhgs/brick0
├── dir
│ └── a
└── file100
/rhgs/brick1
├── dir
│ └── a
└── file100
/rhgs/brick2
├── dir
├── file1
├── file2
└── file99
/rhgs/brick3
├── dir
├── file1
├── file2
└── file99
In the following output, some of the files in the volume are in split-brain.
Brick test-host:/rhgs/brick1/
/file100
/dir
Number of entries in split-brain: 2
Brick test-host:/rhgs/brick2/
/file99
<gfid:5399a8d1-aee9-4653-bb7f-606df02b3696>
Number of entries in split-brain: 2
Brick test-host:/rhgs/brick3/
<gfid:05c4b283-af58-48ed-999e-4d706c7b97d5>
<gfid:5399a8d1-aee9-4653-bb7f-606df02b3696>
Number of entries in split-brain: 2
330
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
The above command executed from mount provides information if a file is in data or meta-data
split-brain. This command is not applicable to gfid/entry split-brain.
For example,
dir is in gfid/entry split-brain but as mentioned earlier, the above command is does
not display if the file is in gfid/entry split-brain. Hence, the command displays The
file is not under data or metadata split-brain. For information on resolving
gfid/entry split-brain, see Chapter 26, Manually Recovering File Split-brain .
2. Analyze the files in data and meta-data split-brain and resolve the issue
When you perform operations like cat, getfattr, and more from the mount on files in split-
brain, it throws an input/output error. For further analyzing such files, you can use setfattr
command.
331
Administration Guide
Using this command, a particular brick can be chosen to access the file in split-brain.
For example,
file1 is in data-split-brain and when you try to read from the file, it throws input/output
error.
# cat file1
cat: file1: Input/output error
Setting test-client-2 as split-brain choice for file1 serves reads from b2 for the file.
Now, you can perform operations on the file. For example, read operations on the file:
# cat file1
xyz
Trying to inspect the file from a wrong choice errors out. You can undo the split-brain-choice
that has been set, the above mentioned setfattr command can be used with none as the
value for extended attribute.
For example,
Now performing cat operation on the file will again result in input/output error, as before.
# cat file
cat: file1: Input/output error
After you decide which brick to use as a source for resolving the split-brain, it must be set for
the healing to be done.
Example
The above process can be used to resolve data and/or meta-data split-brain on all the files.
332
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
After setting the split-brain-choice on the file, the file can be analyzed only for five minutes. If
the duration of analyzing the file needs to be increased, use the following command and set the
required time in timeout-in-minute argument.
This is a global timeout and is applicable to all files as long as the mount exists. The timeout
need not be set each time a file needs to be inspected but for a new mount it will have to be set
again for the first time. This option becomes invalid if the operations like add-brick or remove-
brick are performed.
NOTE
You can resolve the split-brin from the gluster CLI by the following ways:
NOTE
The entry/gfid split-brain resolution is not supported using CLI. For information on
resolving gfid/entry split-brain, see Chapter 26, Manually Recovering File Split-brain .
1. Run the following command to obtain the list of files that are in split-brain:
Brick <hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2>
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd>
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac>
Number of entries in split-brain: 3
Brick <hostname:brickpath-b2>
333
Administration Guide
/dir/file1
/dir
/file4
Number of entries in split-brain: 3
From the command output, identify the files that are in split-brain.
You can find the differences in the file size and md5 checksums by performing a stat and md5
checksums on the file from the bricks. The following is the stat and md5 checksum output of a
file:
On brick b1:
# stat b1/dir/file1
File: ‘b1/dir/file1’
Size: 17 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919362 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 13:55:40.149897333 +0530
Modify: 2015-03-06 13:55:37.206880347 +0530
Change: 2015-03-06 13:55:37.206880347 +0530
Birth: -
# md5sum b1/dir/file1
040751929ceabf77c3c0b3b662f341a8 b1/dir/file1
On brick b2:
# stat b2/dir/file1
File: ‘b2/dir/file1’
Size: 13 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919365 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 13:54:22.974451898 +0530
Modify: 2015-03-06 13:52:22.910758923 +0530
Change: 2015-03-06 13:52:22.910758923 +0530
Birth: -
# md5sum b2/dir/file1
cb11635a45d45668a403145059c2a0d5 b2/dir/file1
You can notice the differences in the file size and md5 checksums.
2. Execute the following command along with the full file name as seen from the root of the
volume (or) the gfid-string representation of the file, which is displayed in the heal info
command's output.
For example,
334
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
After the healing is complete, the md5sum and file size on both bricks must be same. The following is a
sample output of the stat and md5 checksums command after completion of healing the file.
On brick b1:
# stat b1/dir/file1
File: ‘b1/dir/file1’
Size: 17 Blocks: 16 IO Block: 4096 regular file
Device: fd03h/64771d Inode: 919362 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2015-03-06 14:17:27.752429505 +0530
Modify: 2015-03-06 13:55:37.206880347 +0530
Change: 2015-03-06 14:17:12.880343950 +0530
Birth: -
# md5sum b1/dir/file1
040751929ceabf77c3c0b3b662f341a8 b1/dir/file1
On brick b2:
# stat b2/dir/file1
File: ‘b2/dir/file1’
Size: 17 Blocks: 16 IO Block: 4096 regular file
Device: fd03h/64771d Inode: 919365 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2015-03-06 14:17:23.249403600 +0530
Modify: 2015-03-06 13:55:37.206880000 +0530
Change: 2015-03-06 14:17:12.881343955 +0530
Birth: -
# md5sum b2/dir/file1
040751929ceabf77c3c0b3b662f341a8 b2/dir/file1
1. Run the following command to obtain the list of files that are in split-brain:
Brick <hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2>
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd>
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac>
Number of entries in split-brain: 3
Brick <hostname:brickpath-b2>
/dir/file1
/dir
/file4
Number of entries in split-brain: 3
From the command output, identify the files that are in split-brain.
335
Administration Guide
You can find the differences in the file size and md5 checksums by performing a stat and md5
checksums on the file from the bricks. The following is the stat and md5 checksum output of a
file:
On brick b1:
stat b1/file4
File: ‘b1/file4’
Size: 4 Blocks: 16 IO Block: 4096
regular file
Device: fd03h/64771d Inode: 919356 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 13:53:19.417085062 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 13:53:19.426085114 +0530
Birth: -
# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183 b1/file4
On brick b2:
# stat b2/file4
File: ‘b2/file4’
Size: 4 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919358 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 13:52:35.761833096 +0530
Modify: 2015-03-06 13:52:35.769833142 +0530
Change: 2015-03-06 13:52:35.769833142 +0530
Birth: -
# md5sum b2/file4
0bee89b07a248e27c83fc3d5951213c1 b2/file4
You can notice the differences in the md5 checksums, and the modify time.
In this command, FILE can be either the full file name as seen from the root of the volume or
the gfid-string representation of the file.
For example,
After the healing is complete, the md5 checksum, file size, and modify time on both bricks
336
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
must be same. The following is a sample output of the stat and md5 checksums command after
completion of healing the file. You can notice that the file has been healed using the brick
having the latest mtime (brick b1, in this example) as the source.
On brick b1:
# stat b1/file4
File: ‘b1/file4’
Size: 4 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919356 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 14:23:38.944609863 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 14:27:15.058927962 +0530
Birth: -
# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183 b1/file4
On brick b2:
# stat b2/file4
File: ‘b2/file4’
Size: 4 Blocks: 16 IO Block: 4096
regular file
Device: fd03h/64771d Inode: 919358 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 14:23:38.944609000 +0530
Modify: 2015-03-06 13:53:19.426085000 +0530
Change: 2015-03-06 14:27:15.059927968 +0530
Birth:
# md5sum b2/file4
b6273b589df2dfdbd8fe35b1011e3183 b2/file4
1. Run the following command to obtain the list of files that are in split-brain:
Brick <hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2>
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd>
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac>
Number of entries in split-brain: 3
Brick <hostname:brickpath-b2>
/dir/file1
/dir
/file4
Number of entries in split-brain: 3
337
Administration Guide
From the command output, identify the files that are in split-brain.
You can find the differences in the file size and md5 checksums by performing a stat and md5
checksums on the file from the bricks. The following is the stat and md5 checksum output of a
file:
On brick b1:
stat b1/file4
File: ‘b1/file4’
Size: 4 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919356 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 13:53:19.417085062 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 13:53:19.426085114 +0530
Birth: -
# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183 b1/file4
On brick b2:
# stat b2/file4
File: ‘b2/file4’
Size: 4 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919358 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 13:52:35.761833096 +0530
Modify: 2015-03-06 13:52:35.769833142 +0530
Change: 2015-03-06 13:52:35.769833142 +0530
Birth: -
# md5sum b2/file4
0bee89b07a248e27c83fc3d5951213c1 b2/file4
You can notice the differences in the file size and md5 checksums.
For example,
338
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
After the healing is complete, the md5 checksum and file size on both bricks must be same.
The following is a sample output of the stat and md5 checksums command after completion of
healing the file.
On brick b1:
# stat b1/file4
File: ‘b1/file4’
Size: 4 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919356 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 14:23:38.944609863 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 14:27:15.058927962 +0530
Birth: -
# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183 b1/file4
On brick b2:
# stat b2/file4
File: ‘b2/file4’
Size: 4 Blocks: 16 IO Block: 4096 regular
file
Device: fd03h/64771d Inode: 919358 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
root)
Access: 2015-03-06 14:23:38.944609000 +0530
Modify: 2015-03-06 13:53:19.426085000 +0530
Change: 2015-03-06 14:27:15.059927968 +0530
Birth: -
# md5sum b2/file4
b6273b589df2dfdbd8fe35b1011e3183 b2/file4
1. Run the following command to obtain the list of files that are in split-brain:
From the command output, identify the files that are in split-brain.
In this command, for all the files that are in split-brain in this replica,
<HOSTNAME:BRICKNAME> is taken as source for healing.
For example,
339
Administration Guide
Multithreaded Self-heal
Self-heal daemon has the capability to handle multiple heals in parallel and is supported on Replicate
and Distribute-replicate volumes. However, increasing the number of heals has impact on I/O
performance so the following options have been provided. The cluster.shd-max-threads volume
option controls the number of entries that can be self healed in parallel on each replica by self-heal
daemon using. Using cluster.shd-wait-qlength volume option, you can configure the number of
entries that must be kept in the queue for self-heal daemon threads to take up as soon as any of the
threads are free to heal.
There are various commands that can be used to check the healing status of volumes and files, or to
manually initiate healing:
For example, to view the list of files on test-volume that need healing:
Brick server2:/gfs/test-volume_1
/95.txt
/32.txt
/66.txt
/35.txt
/18.txt
/26.txt - Possibly undergoing heal
/47.txt
/55.txt
/85.txt - Possibly undergoing heal
...
Number of entries: 101
340
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
For example, to view the list of files on test-volume that are in a split-brain state:
For a Distributed Dispersed volume, there will be multiple sets of bricks (subvolumes) that stores data
with erasure coding. All the files are distributed over these sets of erasure coded subvolumes. In this
scenario, even if a redundant number of bricks is lost from every dispersed subvolume, there is no data
loss.
For example, assume you have Distributed Dispersed volume of configuration 2 X (4 + 2). Here, you
have two sets of dispersed subvolumes where the data is erasure coded between 6 bricks with 2 bricks
for redundancy. The files will be stored in one of these dispersed subvolumes. Therefore, even if we
lose two bricks from each set, there is no data loss.
Brick Configurations
The following table lists the brick layout details of multiple server/disk configurations for dispersed
and distributed dispersed volumes.
341
Administration Guide
Table 11.2. Brick Configurations for Dispersed and Distributed Dispersed Volumes
Redun Suppor Bricks Node Max Compa Increm Min Total Tolerat
dancy ted per Loss brick tible ent numbe Spindle ed HDD
Level Config Server failure Server Size r of s Failure
uration per count Node (no. of sub- Percen
s Subvol within count nodes) volume tage
ume a s
subvol
ume
12 HDD Chassis
2 4+2 2 1 2 3 3 6 36 33.33
%
1 2 2 6 6 12 72 33.33
%
2 8+2 2 1 2 5 5 6 60 20.00
%
1 2 2 10 10 12 120 20.00
%
4 8+4 4 1 4 3 3 3 36 33.33
%
2 2 4 6 6 6 72 33.33
%
1 4 4 12 12 12 144 33.33
%
4 16 + 4 4 1 4 5 5 3 60 20.00
%
2 2 4 10 10 6 120 20.00
%
1 4 4 20 20 12 240 20.00
%
24 HDD Chassis
2 4+2 2 1 2 3 3 12 72 33.33
%
342
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
Redun Suppor Bricks Node Max Compa Increm Min Total Tolerat
dancy ted per Loss brick tible ent numbe Spindle ed HDD
Level Config Server failure Server Size r of s Failure
uration per count Node (no. of sub- Percen
s Subvol within count nodes) volume tage
ume a s
subvol
ume
1 2 2 6 6 24 144 33.33
%
2 8+ 2 2 1 2 5 5 12 120 20.00
%
1 2 2 10 10 24 240 20.00
%
4 8+4 4 1 4 3 3 6 72 33.33
%
2 2 4 6 6 12 144 33.33
%
1 4 4 12 12 24 288 33.33
%
4 16 + 4 4 1 4 5 5 6 120 20.00
%
2 2 4 10 10 12 240 20.00
%
1 4 4 20 20 24 480 20.00
%
36 HDD Chassis
1 2 2 6 6 36 216 33.33
%
1 2 2 10 10 36 360 20.00
%
343
Administration Guide
Redun Suppor Bricks Node Max Compa Increm Min Total Tolerat
dancy ted per Loss brick tible ent numbe Spindle ed HDD
Level Config Server failure Server Size r of s Failure
uration per count Node (no. of sub- Percen
s Subvol within count nodes) volume tage
ume a s
subvol
ume
2 2 4 6 6 18 216 33.33
%
1 4 4 12 12 36 432 33.33
%
4 16 + 4 4 1 4 5 5 9 180 20.00
%
2 2 4 10 10 18 360 20.00
%
1 4 4 20 20 36 720 20.00
%
60 HDD Chassis
1 2 2 6 6 60 360 33.33
%
1 2 2 10 10 60 600 20.00
%
344
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
Redun Suppor Bricks Node Max Compa Increm Min Total Tolerat
dancy ted per Loss brick tible ent numbe Spindle ed HDD
Level Config Server failure Server Size r of s Failure
uration per count Node (no. of sub- Percen
s Subvol within count nodes) volume tage
ume a s
subvol
ume
2 2 4 6 6 30 360 33.33
%
1 4 4 12 12 60 720 33.33
%
4 16 + 4 4 1 4 5 5 15 300 20.00
%
2 2 4 10 10 30 600 20.00
%
1 4 4 20 20 60 1200 20.00
%
This example's brick configuration is explained in row 1 of Table 11.2, “Brick Configurations for
Dispersed and Distributed Dispersed Volumes”.
With this server-to-spindle ratio, 36 disks/spindles are allocated for the dispersed volume
configuration. For example, to create a compact 4+2 dispersed volume using 6 spindles from the total
disk pool over three servers, run the following command:
Note that the --force parameter is required because this configuration is not optimal in terms of
fault tolerance. Since each server provides two bricks, this configuration has a greater risk to data
availability if a server goes offline than it would if each brick was provided by a separate server.
Run the gluster volume info command to view the volume information.
345
Administration Guide
Transport-type: tcp
Bricks:
Brick1: server1:/rhgs/brick1
Brick2: server1:/rhgs/brick2
Brick3: server2:/rhgs/brick3
Brick4: server2:/rhgs/brick4
Brick5: server3:/rhgs/brick5
Brick6: server3:/rhgs/brick6
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Additionally, you can convert the dispersed volume to a distributed dispersed volume in increments of
4+2. Add six bricks from the disk pool using the following command:
Run the gluster volume info command to view distributed dispersed volume information.
Using this configuration example, you can create configuration combinations of 6 x (4 + 2) distributed
dispersed volumes. This example configuration has tolerance up to 12 brick failures.
For details about creating an optimal configuration, see Section 5.9, “Creating Dispersed Volumes” .
346
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
Note that the --force parameter is required because this configuration is not optimal in terms of
fault tolerance. Since each server provides more than one brick, this configuration has a greater risk to
data availability if a server goes offline than it would if each brick was provided by a separate server.
For details about creating an optimal configuration, see Section 5.9, “Creating Dispersed Volumes” .
In this example, there are m bricks (refer to section Section 5.9, “Creating Dispersed Volumes” for
information on n = k+m equation) from a dispersed subvolume on each server. If you add more than
347
Administration Guide
m bricks from a dispersed subvolume on server S, and if the server S goes down, data will be
unavailable.
If S (a single column in the above diagram) goes down, there is no data loss, but if there is any
additional hardware failure, either another node going down or a storage device failure, there would be
immediate data loss.
348
CHAPTER 11. MANAGING RED HAT GLUSTER STORAGE VOLUMES
Redundancy Comparison
The following chart illustrates the redundancy comparison of all supported dispersed volume
configurations.
349
Administration Guide
350
CHAPTER 12. MANAGING RED HAT GLUSTER STORAGE LOGS
Table 12.1.
351
Administration Guide
/var/log/glusterf
s/quota-crawl.log
Whenever quota is
enabled, a file system
crawl is performed and
the corresponding log is
stored in this file
/var/log/glusterf
s/quota-
mount-VOLNAME.log
An auxiliary FUSE client
is mounted in <gluster-
run-dir>/VOLNAME of
the glusterFS and the
corresponding client
logs found in this file.
Bitrot
/var/log/glusterf
s/bitd.log
/var/log/glusterf
s/scrub.log
352
CHAPTER 12. MANAGING RED HAT GLUSTER STORAGE LOGS
Geo-replication /var/log/glusterfs/geo-
replication/<master>
/var/log/glusterfs/geo-
replication-slaves
gluster volume heal /var/log/glusterfs/glfs One log file per server on which
VOLNAME info command heal-VOLNAME.log the command is executed.
gluster-swift /var/log/messages
SwiftKrbAuth /var/log/httpd/error_lo
g
Command Line Interface logs /var/log/glusterfs/cli. This file captures log entries for
log every command that is executed
on the Command Line
Interface(CLI).
To know more about these options, see topic Configuring Volume Options in the Red Hat Gluster Storage
Administration Guide.
353
Administration Guide
# glusterd --log-format=<value>
# glusterd --log-format=with-msg-id
# glusterd --log-format=no-msg-id
To a list of error messages, see the Red Hat Gluster Storage Error Message Guide.
See Also:
For example, if the log level is set to INFO, only CRITICAL, ERROR, WARNING, and INFO messages are
logged.
CRITICAL
ERROR
WARNING
INFO
DEBUG
TRACE
354
CHAPTER 12. MANAGING RED HAT GLUSTER STORAGE LOGS
IMPORTANT
Setting the log level to TRACE or DEBUG generates a very large number of log
messages and can lead to disks running out of space very quickly.
Edit the /etc/sysconfig/glusterd file, and set the value of the LOG_LEVEL parameter to the log
level that you want glusterd to use.
## Set custom log file and log level (below are defaults)
#LOG_FILE='/var/log/glusterfs/glusterd.log'
LOG_LEVEL='VALUE'
355
Administration Guide
This change does not take effect until glusterd is started or restarted with the service or
systemctl command.
In the /etc/sysconfig/glusterd file, locate the LOG_LEVEL parameter and set its value to
WARNING.
## Set custom log file and log level (below are defaults)
#LOG_FILE='/var/log/glusterfs/glusterd.log'
LOG_LEVEL='WARNING'
Then start or restart the glusterd service. On Red Hat Enterprise Linux 7, run:
See Also:
356
CHAPTER 12. MANAGING RED HAT GLUSTER STORAGE LOGS
# glusterd --log-flush-timeout=<value>
# glusterd --log-flush-timeout=60
# glusterd --log-buf-size=<value>
357
Administration Guide
# glusterd --log-buf-size=10
NOTE
See Also:
Master-log-file - log file for the process that monitors the master volume.
Master-gluster-log-file - log file for the maintenance mount point that the geo-
replication module uses to monitor the master volume.
Slave-gluster-log-file - If the slave is a Red Hat Gluster Storage Volume, this log file is
the slave's counterpart of Master-gluster-log-file.
For example:
1. On the master, run the following command to display the session-owner details:
For example:
358
CHAPTER 12. MANAGING RED HAT GLUSTER STORAGE LOGS
2. On the slave, run the following command with the session-owner value from the previous step:
For example:
359
Administration Guide
Pre Scripts: These scripts are run before the occurrence of the event. You can write a script to
automate activities like managing system-wide services. For example, you can write a script to stop
exporting the SMB share corresponding to the volume before you stop the volume.
Post Scripts: These scripts are run after execution of the event. For example, you can write a script to
export the SMB share corresponding to the volume after you start the volume.
Creating a volume
Starting a volume
Adding a brick
Removing a brick
Stopping a volume
Deleting a volume
Naming Convention
While creating the file names of your scripts, you must follow the naming convention followed in your
underlying file system like XFS.
NOTE
To enable the script, the name of the script must start with an S . Scripts run in
lexicographic order of their names.
/var/lib/glusterd/hooks/1/create/
/var/lib/glusterd/hooks/1/delete/
/var/lib/glusterd/hooks/1/start/
/var/lib/glusterd/hooks/1/stop/
/var/lib/glusterd/hooks/1/set/
/var/lib/glusterd/hooks/1/add-brick/
360
CHAPTER 13. MANAGING RED HAT GLUSTER STORAGE VOLUME LIFE-CYCLE EXTENSIONS
/var/lib/glusterd/hooks/1/remove-brick/
After creating a script, you must ensure to save the script in its respective folder on all the nodes of the
trusted storage pool. The location of the script dictates whether the script must be executed before or
after an event. Scripts are provided with the command line argument --volname=VOLNAME to specify
the volume. Command-specific additional arguments are provided for the following volume operations:
Start volume
Stop volume
Set volume
-o key=value
1. Adds Samba share configuration details of the volume to the smb.conf file
2. Mounts the volume through FUSE and adds an entry in /etc/fstab for the same.
1. Removes the Samba share details of the volume from the smb.conf file
2. Unmounts the FUSE mount point and removes the corresponding entry in /etc/fstab
361
Administration Guide
Containerized Red Hat Gluster Storage 3.1.2 is supported only on Red Hat Enterprise Linux Atomic
Host 7.2. For more information about installing containerized Red Hat Gluster Storage, see the Red Hat
Gluster Storage 3.3 Installation Guide: https://access.redhat.com/documentation/en-
us/red_hat_gluster_storage/3.3/html/installation_guide/.
NOTE
For Red Hat Gluster Storage 3.1.2, Erasure Coding, NFS-Ganesha, BitRot, and Data
Tiering are not supported with containerized Red Hat Gluster Storage.
14.1. PREREQUISITES
Before creating a container, execute the following steps.
1. Create the directories in the atomic host for persistent mount by executing the following
command:
2. Ensure the bricks that are required are mounted on the atomic hosts. For more information
see Section 21.2, “Brick Configuration”.
3. If Snapshot is required, then ensure that the dm-snapshot kernel module is loaded in Atomic
Host system. If it is not loaded, then load it by executing the following command:
# modprobe dm_snapshot
where,
362
CHAPTER 14. MANAGING CONTAINERIZED RED HAT GLUSTER STORAGE
--net=host option ensures that the container has full access to the network stack of the
host.
For example:
5ac864b5abc74a925aecc4fe9613c73e83b8c54a846c36107aa8e2960eeb97b4
NOTE
In the above command, the following ensures that the gluster configuration
are persistent.
-v /etc/glusterfs:/etc/glusterfs:z -v
/var/lib/glusterd:/var/lib/glusterd -v
/var/log/glusterfs:/var/log/glusterfs
where, /mnt/brick1 is the mountpoint of the brick in the atomic host and
:/mnt/container_brick1 is the mountpoint of the brick in the container.
For example:
363
Administration Guide
/mnt/brick1:/mnt/container_brick1:z rhgs3/rhgs-server-rhel7
5da2bc217c0852d2b1bfe4fb31e0181753410071584b4e38bd77d7502cd3e92b
# docker ps
For example:
# docker ps
For example:
3. To verify if the bricks are mounted successfully, execute the following command:
364
CHAPTER 14. MANAGING CONTAINERIZED RED HAT GLUSTER STORAGE
365
Administration Guide
The gluster volume bitrot command scans all the bricks in a volume for BitRot issues in a
process known as scrubbing. The process calculates the checksum for each file or object, and
compares that checksum against the actual data of the file. When BitRot is detected in a file, that file is
marked as corrupted, and the detected errors are logged in the following files:
/var/log/glusterfs/bitd.log
/var/log/glusterfs/scrub.log
366
CHAPTER 15. DETECTING BITROT
This command prints a summary of scrub status on the specified volume, including various
configuration details and the location of the bitrot and scrubber error logs for this volume. It also
prints details each node scanned for errors, along with identifiers for any corrupted objects located.
IMPORTANT
Mount all volumes using the -oaux-gfid-mount mount option, and enable GFID-to-
path translation on each volume by running the following command.
Files created before this option was enabled must be looked up with the find command.
367
Administration Guide
For files created before GFID-to-path translation was enabled, use the find command to
determine the path of the corrupted file and the index file that match the identifying GFID.
# mkdir /mnt/recovery
$ stat /mnt/recovery/corrupt-file
If you do not have client self-heal enabled, you must manually heal the volume with the
following command.
368
CHAPTER 15. DETECTING BITROT
# umount /mnt/recovery
# rmdir /mnt/recovery
The next time that the bitrot scrubber runs, this GFID is no longer listed (unless it has become
corrupted again).
369
Administration Guide
Glusterfind Create
Glusterfind Pre
Glusterfind Post
Glusterfind Query
Glusterfind List
Glusterfind Delete
NOTE
All the glusterfind configuration commands such as, glusterfind pre, glusterfind post,
glusterfind list, and glusterfind delete for a session have to be executed only on the node
on which session is created.
Glusterfind Create
To create a session for a particular instance in the volume, execute the following command:
where,
--reset-session-time: forces reset of the session time. The next incremental run will start from this time.
volname: Name of the volume for which the create command is executed.
For example:
370
CHAPTER 16. INCREMENTAL BACKUP ASSISTANCE USING GLUSTERFIND
Glusterfind Pre
To retrieve the list of modified files and directories and store it in the outfile, execute the following
command:
where,
--regenerate-outfile: Regenerates a new outfile and discards the outfile generated from the last pre
command.
--no-encode: The file paths are encoded by default in the output file. This option disables encoding of
file paths.
--field-separator: Specifies the character/s that glusterfind output uses to separate fields. By default
this is a single space, but if your file names contain spaces, you may want to change the delimiter so
you can parse the output of glusterfind automatically.
volname: Name of the volume for which the pre command is executed.
For example:
371
Administration Guide
NOTE
The output format is <TYPE> <PATH1> <PATH2>. Possible type values are, NEW,
MODIFY, DELETE and RENAME. PATH2 is applicable only if type is RENAME. For
example:
NEW file1
NEW dir1%2Ffile2
MODIFY dir3%2Fdir4%2Ftest3
RENAME test1 dir1%2F%2Ftest1new
DELETE test2
NEW file1
NEW dir1/file2
MODIFY dir3/dir4/test3
RENAME test1 dir1/test1new
DELETE test2
Glusterfind Post
The following command is run to update the session time:
where,
volname: Name of the volume for which the post command is executed.
For example:
Glusterfind List
To list all the active sessions and the corresponding volumes present in the cluster, execute the
following command:
where,
--volume VOLUME: Displays all the active sessions corresponding to that volume
For example:
372
CHAPTER 16. INCREMENTAL BACKUP ASSISTANCE USING GLUSTERFIND
# glusterfind list
SESSION VOLUME SESSION TIME
--------------------------------------------------
sess_vol1 vol1 2015-06-22 22:22:53
Glusterfind Query
The glusterfind query subcommand provides a list of changed files based on a specified time
stamp. These commands do not check any change log information. Use the glusterfind query
subcommand when your backup software maintains its own checkpoints and time stamps outside
glusterfind.
To retrieve files changed between two timestamps, run the following command:
Time stamps are expected in seconds since the Linux epoch date (1970-01-01 00:00:00 UTC). Current
Linux epoch time can be output by running echo $(date +'%s') on the command line.
You can retrieve all files in the volume by running the following command:
When running a full find operation, you can also retrieve a subset of files according to a tag. For
example, to find all new files on a volume, run the following command:
By default, the output of glusterfind uses a single space to separate fields. If your file names contain
spaces, you may want to change the delimiter in order to parse the output of glusterfind automatically.
You can set the delimiter to one or more characters by using the --field-separator option. The
following command sets the field separator to ==.
Glusterfind Delete
To clear out all the session information associated with that particular session, execute the following
command:
where,
volname: Name of the volume for which the delete command is executed.
For example:
373
Administration Guide
374
CHAPTER 17. MANAGING TIERING
IMPORTANT
Data is moved, not copied, from one tier to another. When a file is moved to one tier, a
copy is not kept on the other tier.
Tiering monitors and identifies the activity level of the data and automatically moves the active and
inactive data to the most appropriate storage tier. Moving data between tiers of hot and cold storage is
a computationally expensive task. To address this, Red Hat Gluster Storage supports automated
promotion and demotion of data within a volume in the background so as to minimize impact on
foreground I/O. Data becomes hot or cold based on the rate at which it is accessed. If access to a file
increases, it moves to the hot tier or retains its place in the hot tier. If the file is not accessed for a
while, it moves to the cold tier, or retains it place in the cold tier. Hence, the data movement can
happen in either direction which is based totally on the access frequency.
Different sub-volume types act as hot and cold tiers and data is automatically assigned or reassigned a
“temperature” based on the frequency of access. Red Hat Gluster Storage allows attaching fast
performing disks as hot tier, uses the existing volume as cold tier, and these hot tier and cold tier forms
a single tiered volume. For example, the existing volume may be distributed dispersed on HDDs and
the hot tier could be distributed-replicated on SSDs.
Hot Tier
The hot tier is the tiering volume created using better performing subvolumes, an example of which
could be SSDs. Frequently accessed data is placed in the highest performance and most expensive hot
tier. Hot tier volume could be a distributed volume or distributed-replicated volume.
WARNING
Distributed volumes can suffer significant data loss during a disk or server failure
because directory contents are spread randomly across the bricks in the volume.
Red Hat recommends creating distributed-replicated tier volume.
Cold Tier
The cold tier is the existing Red Hat Gluster Storage volume created using slower storage such as
Spinning disks. Inactive or infrequently accessed data is placed in the lowest-cost cold tier.
Data Migration
Tiering automatically migrates files between hot tier and cold tier to improve the storage performance
and resource use.
375
Administration Guide
The following diagrams illustrates how tiering works when attached to a distributed-dispersed volume.
Here, the existing distributed-dispersed volume would become a cold-tier and the new fast/expensive
storage device would act as a hot tier. Frequently accessed files will be migrated from cold tier to the
hot tier for better performance.
376
CHAPTER 17. MANAGING TIERING
Native client support for tiering is limited to Red Hat Enterprise Linux version 6.7, 6.8 and 7.x
clients. Tiered volumes cannot be mounted by Red Hat Enterprise Linux 5.x clients.
Tiering works only with cache friendly workloads. Attaching a tier volume to a cache
unfriendly workload will lead to slow performance. In a cache friendly workload, most of
the reads and writes are accessing a subset of the total amount of data. And, this subset fits on
the hot tier. This subset should change only infrequently.
Tiering feature is supported only on Red Hat Enterprise Linux 7 based Red Hat Gluster Storage.
Tiering feature is not supported on Red Hat Enterprise Linux 6 based Red Hat Gluster Storage.
Only Fuse and gluster-nfs access is supported. Server Message Block (SMB) and nfs-ganesha
access to tiered volume is not supported.
Creating snapshot of a tiered volume is supported. Snapshot clones are not supported with the
tiered volumes.
When you run tier detach commit or tier detach force, ongoing I/O operations may
fail with a Transport endpoint is not connected error.
Files on which POSIX locks has been taken are not migrated until all locks are released.
Add brick, remove brick, and rebalance operations are not supported on the tiered volume. For
information on expanding a tiered volume, see Section 11.5.1, “Expanding a Tiered Volume” and
for information on shrinking a tiered volume, see Section 11.6.2, “Shrinking a Tiered Volume ”
It is highly recommended to provision your storage liberally and generously before attaching a tier.
You create a normal volume and then attach bricks to it, which are the hot tier:
For example,
2. Run gluster volume info command to optionally display the volume information.
377
Administration Guide
The tier start command is triggered automatically after the tier has been attached. In some cases, if
the tier process has not started you must start it manually using the gluster volume tier
VOLNAME start force command.
IMPORTANT
1. Stop geo-replication between the master and slave, using the following command:
For example:
378
CHAPTER 17. MANAGING TIERING
For example, to create a distributed-replicated tier volume with replica count two:
For example
4. Verify whether geo-replication session has started with tier's bricks, using the following
command:
For example,
The promotion and demotion of files is determined by how full the hot tier is. Data accumulates on the
hot tier until it reaches the low watermark, even if it is not accessed for a period of time. This prevents
files from being demoted unnecessarily when there is plenty on free space on the hot tier. When the
hot tier is fuller than the lower watermark but less than the high watermark, data is randomly
promoted and demoted where the likelihood of promotion decreases as the tier becomes fuller; the
opposite holds for demotion. If the hot tier is fuller than the high watermark, promotions stop and
demotions happen more frequently in order to free up space.
379
Administration Guide
The following diagram illustrates how cache mode works and the example values you can set.
To set the percentage for promotion and demotion of files, run the following commands:
To set the frequency for the promotion and demotion of files, run the following command:
Set the read and write frequency threshold by executing the following command:
380
CHAPTER 17. MANAGING TIERING
NOTE
The value of 0 indicates that the threshold value is not considered. Any value in the
range of 1-1000 denotes the number of times the contents of file must be modified to
consider for promotion or demotion...
NOTE
The value of 0 indicates that the threshold value is not considered. Any value in the
range of 1-1000 denotes the number of times the contents of file contents have been
accessed to consider for promotion or demotion.
If the cluster.tier-max-mb count is not set, then the default data size is set to 4000 MB.
If the cluster.tier-max-files count is not set, then the default count is set to 10000.
For example,
381
Administration Guide
For example,
2. Monitor the status of detach tier until the status displays the status as complete.
For example,
NOTE
It is possible that some files are not migrated to the cold tier on a detach
operation for various reasons like POSIX locks being held on them. Check for
files on the hot tier bricks and you can either manually move the files, or turn off
applications (which would presumably unlock the files) and stop/start detach
tier, to retry.
3. When the tier is detached successfully as shown in the previous status command, run the
following command to commit the tier detach:
For example,
382
CHAPTER 17. MANAGING TIERING
NOTE
When you run tier detach commit or tier detach force, ongoing I/O operations
may fail with a Transport endpoint is not connected error.
After the detach tier commit is completed, you can verify that the volume is no longer a tier volume by
running gluster volume info command.
For example,
2. Monitor the status of detach tier until the status displays the status as complete.
For example,
NOTE
There could be some number of files that were not moved. Such files may have
been locked by the user, and that prevented them from moving to the cold tier
on the detach operation. You must check for such files. If you find any such files,
you can either manually move the files, or turn off applications (which would
presumably unlock the files) and stop/start detach tier, to retry.
383
Administration Guide
3. Set a checkpoint on a geo-replication session to ensure that all the data in that cold-tier is
synced to the slave. For more information on geo-replication checkpoints, see Section 10.4.4.1,
“Geo-replication Checkpoints”.
For example,
4. Use the following command to verify the checkpoint completion for the geo-replication session
5. Stop geo-replication between the master and slave, using the following command:
For example:
For example,
After the detach tier commit is completed, you can verify that the volume is no longer a tier
volume by running gluster volume info command.
For example,
384
PART V. MONITOR AND TUNE
385
Administration Guide
To view metrics data and monitor Red Hat Gluster Storage servers with Red Hat Gluster Storage Web
Administration, see the following publications:
You can also monitor Red Hat Gluster Storage servers on Nagios platform to monitor Red Hat Gluster
Storage trusted storage pool, hosts, volumes, and services. You can monitor utilization, status, alerts
and notifications for status and utilization changes.
Using Nagios, the physical resources, logical resources, and processes (CPU, Memory, Disk, Network,
Swap, cluster, volume, brick, Host, Volumes, Brick, nfs, shd, quotad, ctdb, smb, glusterd, quota, geo-
replication, self-heal,and server quorum) can be monitored. You can view the utilization and status
through Nagios Server GUI.
Red Hat Gluster Storage trusted storage pool monitoring can be setup in one of the three deployment
scenarios listed below:
This chapter describes the procedures for deploying Nagios on Red Hat Gluster Storage node and Red
Hat Enterprise Linux node. For information on deploying Nagios on Red Hat Gluster Storage Console
node, see Red Hat Gluster Storage Console Administration Guide.
The following diagram illustrates deployment of Nagios on Red Hat Gluster Storage node.
386
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
The following diagram illustrates deployment of Nagios on Red Hat Enterprise Linux node.
18.1. PREREQUISITES
Ensure that you register using Subscription Manager or Red Hat Network Classic (RHN) and enable the
Nagios repositories before installing the Nagios Server.
387
Administration Guide
NOTE
Register using Red Hat Network (RHN) Classic only if you are a Red Hat Satellite user.
To install Nagios on Red Hat Gluster Storage node based on RHEL7, subscribe to rh-
gluster-3-nagios-for-rhel-7-server-rpms repository.
Registering using Red Hat Network (RHN) Classic and subscribing to Nagios channels
NOTE
Once nagios is installed on Red Hat Gluster Storage or RHEL node, verify that the
following booleans are ON by running the getsebool -a | grep nagios command:
nagios_run_sudo --> on
nagios_run_pnp4nagios --> on
nagios
Core program, web interface and configuration files for Nagios server.
python-cpopen
Python package for creating sub-process in simple and safe manner.
python-argparse
388
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
libmcrypt
Encryptions algorithm library.
rrdtool
Round Robin Database Tool to store and display time-series data.
pynag
Python modules and utilities for Nagios plugins and configuration.
check-mk
General purpose Nagios-plugin for retrieving data.
mod_python
An embedded Python interpreter for the Apache HTTP Server.
nrpe
Monitoring agent for Nagios.
nsca
Nagios service check acceptor.
nagios-plugins
Common monitoring plug-ins for nagios.
gluster-nagios-common
Common libraries, tools, configurations for Gluster node and Nagios server add-ons.
nagios-server-addons
Gluster node management add-ons for Nagios.
You must install Nagios on the node which would be used as the Nagios server.
389
Administration Guide
NOTE
If SELinux is configured, the sebooleans must be enabled on all Red Hat Gluster Storage
nodes and the node on which Nagios server is installed.
Enable the following sebooleans on Red Hat Enterprise Linux node if Nagios server is
installed.
# setsebool -P logging_syslogd_run_nagios_plugins on
# setsebool -P nagios_run_sudo on
1. In /etc/nagios/nrpe.cfg file, add the central Nagios server IP address as shown below:
allowed_hosts=127.0.0.1, NagiosServer-HostName-or-IPaddress
NOTE
The host name of the node is used while configuring Nagios server using
auto-discovery. To view the host name, run hostname command.
To start glusterpmd service automatically when the system reboots, run chkconfig --add
glusterpmd command.
You can start the glusterpmd service using service glusterpmd start command and
stop the service using service glusterpmd stop command.
The glusterpmd service is a Red Hat Gluster Storage process monitoring service running in
every Red Hat Gluster Storage node to monitor glusterd, self heal, smb, quotad, ctdbd and
brick services and to alert the user when the services go down. The glusterpmd service
sends its managing services detailed status to the Nagios server whenever there is a state
change on any of its managing services.
This service uses /etc/nagios/nagios_server.conf file to get the Nagios server name
and the local host name given in the Nagios server. The nagios_server.conf is configured
by auto-discovery.
390
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
This section describes how you can monitor Gluster storage trusted pool.
For more information on Nagios Configuration files, see Chapter 22, Nagios Configuration Files
NOTE
For -c, provide a cluster name (a logical name for the cluster) and for -H, provide the host
name or ip address of a node in the Red Hat Gluster Storage trusted storage pool.
2. Enter the current Nagios server host name or IP address to be configured all the nodes.
391
Administration Guide
All the hosts, volumes and bricks are added and displayed.
https://NagiosServer-HostName-or-IPaddress/nagios
NOTE
# nagios -v /etc/nagios/nagios.cfg
If error occurs, verify the parameters set in /etc/nagios/nagios.cfg and update the
configuration files.
3. Log into the Nagios server GUI using the following URL with the Nagios Administrator user
name and password.
https://NagiosServer-HostName-or-IPaddress/nagios
NOTE
To change the default password, see Changing Nagios Password section in Red
Hat Gluster Storage Administration Guide.
392
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
4. Click Services in the left pane of the Nagios server GUI and verify the list of hosts and
services displayed.
To view the details, log into the Nagios Server GUI by using the following URL.
https://NagiosServer-HostName-or-IPaddress/nagios
Cluster Overview
To view the overview of the hosts and services being monitored, click Tactical Overview in the left
pane. The overview of Network Outages, Hosts, Services, and Monitoring Features are displayed.
393
Administration Guide
Host Status
To view the status summary of all the hosts, click Summary under Host Groups in the left pane.
To view the list of all hosts and their status, click Hosts in the left pane.
394
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
NOTE
Cluster also will be shown as Host in Nagios and it will have all the volume services.
Service Status
To view the list of all hosts and their service status click Services in the left pane.
395
Administration Guide
NOTE
In the left pane of Nagios Server GUI, click Availability and Trends under the
Reports field to view the Host and Services Availability and Trends.
Host Services
1. Click Hosts in the left pane. The list of hosts are displayed.
3. Select the service name to view the Service State Information. You can view the utilization of
the following services:
Memory
Swap
CPU
Network
Brick
Disk
The Brick/Disk Utilization Performance data has four sets of information for every mount
point which are brick/disk space detail, inode detail of a brick/disk, thin pool utilization and
thin pool metadata utilization if brick/disk is made up of thin LV.
For Example,
As part of disk utilization service, the following mount points will be monitored: / ,
/boot, /home, /var and /usr if available.
4. To view the utilization graph, click corresponding to the service name. The utilization
graph is displayed.
396
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
5. To monitor status, click on the service name. You can monitor the status for the following
resources:
Disk
Network
6. To monitor process, click on the process name. You can monitor the following processes:
Self-Heal (Self-Heal)
CTDB
SMB
NOTE
Cluster Services
1. Click Hosts in the left pane. The list of hosts and clusters are displayed.
397
Administration Guide
3. To view utilization graph, click corresponding to the service name. You can monitor the
following utilizations:
Cluster
Volume
4. To monitor status, click on the service name. You can monitor the status for the following
resources:
Host
Volume
Brick
5. To monitor cluster services, click on the service name. You can monitor the following:
Volume Quota
Volume Geo-replication
Volume Split-Brain
Cluster Quorum (A cluster quorum service would be present only when there are volumes
in the cluster.)
1. Login to the Nagios Server GUI using the following URL in your browser with nagiosadmin user
name and password.
398
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
https://NagiosServer-HostName-or-IPaddress/nagios
2. Click Services in left pane of Nagios server GUI and click Cluster Auto Config.
3. In Service Commands, click Re-schedule the next check of this service. The
Command Options window is displayed.
399
Administration Guide
1. Login to the Nagios Server GUI using the following URL in your browser with
nagiosadmin user name and password.
https://NagiosServer-HostName-or-IPaddress/nagios
2. Click Hosts in left pane of Nagios server GUI and select the host.
2. Click Services in left pane of Nagios server GUI and select the service to enable or
disable.
2. Click Hosts in left pane of Nagios server GUI and select the host to enable or disable all
services notifications.
400
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
4. Click Commit to enable or disable all service notifications for the selected host.
2. Click Process Info under Systems section from left pane of Nagios server GUI.
4. Click Commit.
1. Login to the Nagios Server GUI using the following URL in your browser with
nagiosadmin user name and password.
https://NagiosServer-HostName-or-IPaddress/nagios
2. Click Services in left pane of Nagios server GUI and select the service to enable
monitoring.
3. Click Enable active checks of this service from the Service Commands and
click Commit.
4. Click Start accepting passive checks for this service from the Service
Commands and click Commit.
1. Login to the Nagios Server GUI using the following URL in your browser with
nagiosadmin user name and password.
https://NagiosServer-HostName-or-IPaddress/nagios
2. Click Services in left pane of Nagios server GUI and select the service to disable
monitoring.
3. Click Disable active checks of this service from the Service Commands and
click Commit.
4. Click Stop accepting passive checks for this service from the Service
Commands and click Commit.
401
Administration Guide
NOTE
Nagios sends email and SNMP notifications, once a service status changes. Refer
Configuring Nagios Server to Send Mail Notifications section of Red Hat Gluster Storage 3
Console Administation Guide to configure email notification and Configuring Simple
Network Management Protocol (SNMP) Notification section of Red Hat Gluster Storage 3
Administation Guide to configure SNMP notification.
Table 18.1.
402
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
403
Administration Guide
CRITICAL Can't remove all hosts When the host used for
except sync host in auto-config itself is
'auto' mode. Run auto removed from the
discovery manually. Gluster peer list. Auto-
config will detect this as
all host except the
synchronized host is
removed from the
cluster. This will not
change the Nagios
configuration and the
user need to manually
run the auto-config.
CPU Utilization OK CPU Status OK: Total When CPU usage is less
CPU:4.6% Idle than 80%.
CPU:95.40%
404
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
405
Administration Guide
406
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
STOPPED state
occurs when
Geo-replication
sessions are
stopped.
NOT_STARTED
state occurs
when there are
multiple Geo-
replication
sessions and
one of them is
stopped.
407
Administration Guide
volfile is locked
as another
transaction in
progress.
408
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
409
Administration Guide
Volume is
stopped or
glusterd
service is
down.
volfile is locked
as another
transaction in
progress.
410
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
411
Administration Guide
define contact {
contact_name Contact1
alias ContactNameAlias
email email-address
service_notification_period 24x7
service_notification_options w,u,c,r,f,s
service_notification_commands notify-service-by-
email
host_notification_period 24x7
host_notification_options d,u,r,f,s
host_notification_commands notify-host-by-
email
}
define contact {
contact_name Contact2
alias ContactNameAlias2
email email-address
service_notification_period 24x7
service_notification_options w,u,c,r,f,s
service_notification_commands notify-service-by-
412
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
email
host_notification_period 24x7
host_notification_options d,u,r,f,s
host_notification_commands notify-host-by-
email
}
The host_notification_options directive is used to define the host states for which
notifications can be sent out to this contact. Valid options are a combination of one or more of
the following:
s: Send notifications when host or service scheduled downtime starts and ends
NOTE
2. To add a group to which the mail need to be sent, add the details as given below:
define contactgroup{
contactgroup_name Group1
alias GroupAlias
members Contact1,Contact2
}
413
Administration Guide
define host{
name gluster-generic-host
use linux-server
notifications_enabled 1
notification_period 24x7
notification_interval 120
notification_options d,u,r,f,s
register 0
contact_groups Group1
contacts Contact1,Contact2
}
define service {
name gluster-service
use generic-service
notifications_enabled 1
notification_period 24x7
notification_options w,u,c,r,f,s
notification_interval 120
register 0
_gluster_entity Service
contact_groups Group1
contacts Contact1,Contact2
You can configure notification for individual services by editing the corresponding node
configuration file. For example, to configure notification for brick service, edit the
corresponding node configuration file as shown below:
define service {
use brick-service
_VOL_NAME VolumeName
__GENERATED_BY_AUTOCONFIG 1
notes Volume : VolumeName
host_name RedHatStorageNodeName
_BRICK_DIR brickpath
service_description Brick Utilization - brickpath
contact_groups Group1
contacts Contact1,Contact2
}
4. To receive detailed information on every update when Cluster Auto-Config is run, edit
/etc/nagios/objects/commands.cfg file add $NOTIFICATIONCOMMENT$\n after
$SERVICEOUTPUT$\n option in notify-service-by-email and notify-host-by-
emailcommand definition as shown below:
414
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios
*****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService:
$SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState:
$SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional
Info:\n\n$SERVICEOUTPUT$\n $NOTIFICATIONCOMMENT$\n" | /bin/mail -s
"** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is
$SERVICESTATE$ **" $CONTACTEMAIL$
}
The Nagios server sends notifications during status changes to the mail addresses specified in the file.
NOTE
By default, the system ensures three occurences of the event before sending
mail notifications.
HostName-or-IP-address public
define contact {
contact_name snmp
alias Snmp Traps
email admin@ovirt.com
service_notification_period 24x7
service_notification_options w,u,c,r,f,s
service_notification_commands gluster-notify-service-by-snmp
host_notification_period 24x7
host_notification_options d,u,r,f,s
host_notification_commands gluster-notify-host-by-snmp
}
415
Administration Guide
You can download the required Management Information Base (MIB) files from the URLs given
below:
NAGIOS-NOTIFY-MIB: https://github.com/nagios-plugins/nagios-
mib/blob/master/MIB/NAGIOS-NOTIFY-MIB
NAGIOS-ROOT-MIB: https://github.com/nagios-plugins/nagios-
mib/blob/master/MIB/NAGIOS-ROOT-MIB
2. Run the command given below with the new user name and type the password when prompted.
3. Add permissions for the new user in /etc/nagios/cgi.cfg file as shown below:
authorized_for_system_information=nagiosadmin,newUserName
authorized_for_configuration_information=nagiosadmin,newUserName
authorized_for_system_commands=nagiosadmin,newUserName
authorized_for_all_services=nagiosadmin,newUserName
authorized_for_all_hosts=nagiosadmin,newUserName
authorized_for_all_service_commands=nagiosadmin,newUserName
authorized_for_all_host_commands=nagiosadmin,newUserName
NOTE
5. Verify Nagios access by using the following URL in your browser, and using the user name and
password.
https://NagiosServer-HostName-or-IPaddress/nagios
416
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
2. To change the default password for the Nagios Administrator user, run the following command
with the new password:
4. Verify Nagios access by using the following URL in your browser, and using the user name and
password that was set in Step 2:
https://NagiosServer-HostName-or-IPaddress/nagios
417
Administration Guide
2. Create an SSL certificate for the server using the following command:
Enter the server's host name which is used to access the Nagios Server GUI as Common Name.
3. Edit the /etc/httpd/conf.d/ssl.conf file and add path to SSL Certificate and key files
correspondingly for SSLCertificateFile and SSLCertificateKeyFile fields as shown
below:
SSLCertificateFile /etc/pki/tls/certs/nagios-ssl.crt
SSLCertificateKeyFile /etc/pki/tls/private/nagios-ssl.key
4. Edit the /etc/httpd/conf/httpd.conf file and comment the port 80 listener as shown
below:
# Listen 80
<Directory "/var/www/html">
6. Restart the httpd service on the nagios server using the following command:
The configurations are displayed as given below if the LDAP apache module is enabled.You can
enable the LDAP apache module by deleting the # symbol.
AuthBasicProvider
418
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
AuthLDAPURL
AuthLDAPBindDN
AuthLDAPBindPassword
3. Edit the CGI authentication file /etc/nagios/cgi.cfg as given below with the path where
Nagios is installed.
4. Uncomment the lines shown below by deleting # and set permissions for specific users:
NOTE
Replace nagiosadmin and user names with * to give any LDAP user full
functionality of Nagios.
authorized_for_system_information=user1,user2,user3
authorized_for_configuration_information=nagiosadmin,user1,user2,use
r3
authorized_for_system_commands=nagiosadmin,user1,user2,user3
authorized_for_all_services=nagiosadmin,user1,user2,user3
authorized_for_all_hosts=nagiosadmin,user1,user2,user3
authorized_for_all_service_commands=nagiosadmin,user1,user2,user3
authorized_for_all_host_commands=nagiosadmin,user1,user2,user3
# getsebool httpd_can_connect_ldap
# setsebool httpd_can_connect_ldap on
6. Restart httpd service and nagios server using the following commands:
NOTE
419
Administration Guide
For more information on Nagios Configuration files, see Chapter 22, Nagios Configuration Files
1. In the /etc/nagios/gluster directory, create a directory with the cluster name. All
configurations for the cluster are added in this directory.
NOTE
define hostgroup{
hostgroup_name cluster-name
alias cluster-name
}
define host{
host_name cluster-name
alias cluster-name
use gluster-cluster
address cluster-name
}
define service {
service_description Cluster - Quorum
use gluster-passive-
service
host_name cluster-name
}
4. Define the Cluster Utilization service to monitor cluster utilization as shown below:
define service {
service_description Cluster
Utilization
use gluster-service-with-graph
check_command check_cluster_vol_usage!warning-
threshold!critcal-threshold;
host_name cluster-name
}
420
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
5. Add the following service definitions for each volume in the cluster:
Volume Status service to monitor the status of the volume as shown below:
define service {
service_description Volume
Status - volume-name
host_name cluster-
name
use gluster-service-without-graph
_VOL_NAME volume-name
notes Volume type
: Volume-Type
check_command check_vol_status!cluster-
name!volume-name
}
define service {
service_description Volume
Utilization - volume-name
host_name cluster-
name
use gluster-service-with-graph
_VOL_NAME volume-name
notes Volume type
: Volume-Type
check_command
check_vol_utilization!cluster-name!volume-name!warning-
threshold!critcal-threshold
}
define service {
service_description Volume Split-
brain status - volume-name
host_name cluster-
name
use gluster-service-without-graph
_VOL_NAME
volume-name
check_command
check_vol_heal_status!cluster1!vol1
}
Volume Quota service to monitor the volume quota status as shown below:
define service {
service_description Volume Quota
- volume-name
host_name cluster-name
use gluster-service-without-graph
421
Administration Guide
_VOL_NAME volume-name
check_command
check_vol_quota_status!cluster-name!volume-name
notes Volume type
: Volume-Type
}
define service {
service_description Volume Geo
Replication - volume-name
host_name cluster-name
use gluster-service-without-graph
_VOL_NAME volume-name
check_command
check_vol_georep_status!cluster-name!volume-name
}
define host {
use gluster-host
hostgroups gluster_hosts,cluster-name
alias host-name
host_name host-name #Name given
by user to identify the node in Nagios
_HOST_UUID host-uuid #Host UUID
returned by gluster peer status
address host-address # This
can be FQDN or IP address of the host
}
define service {
service_description Brick
Utilization - brick-path
host_name host-name #
Host name given in host definition
use brick-service
_VOL_NAME Volume-Name
notes Volume :
Volume-Name
_BRICK_DIR brick-path
}
422
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
define service {
service_description Brick -
brick-path
host_name host-name #
Host name given in host definition
use gluster-brick-status-service
_VOL_NAME Volume-Name
notes Volume :
Volume-Name
_BRICK_DIR brick-path
}
4. Add host configurations and service configurations for all nodes in the cluster as shown in Step
3.
1. In /etc/nagios directory of each Red Hat Gluster Storage node, edit nagios_server.conf
file by setting the configurations as shown below:
# NAGIOS SERVER
# The nagios server IP address or FQDN to which the NSCA command
# needs to be sent
[NAGIOS-SERVER]
nagios_server=NagiosServerIPAddress
# CLUSTER NAME
# The host name of the logical cluster configured in Nagios under
which
# the gluster volume services reside
[NAGIOS-DEFINTIONS]
cluster_name=cluster_auto
The nagios_server.conf file is used by glusterpmd service to get server name, host
name, and the process monitoring interval time.
423
Administration Guide
2. Add normal_check_interval and set the time interval to 1 to check all Red Hat Gluster
Storage services every 1 minute as shown below:
define service {
name gluster-service
use generic-service
notifications_enabled 1
notification_period 24x7
notification_options w,u,c,r,f,s
notification_interval 120
register 0
contacts +ovirt,snmp
_GLUSTER_ENTITY HOST_SERVICE
normal_check_interval 1
}
3. To change this on individual service, add this property to the required service definition as
shown below:
define service {
name gluster-brick-status-service
use gluster-service
register 0
event_handler brick_status_event_handler
check_command check_brick_status
normal_check_interval 1
}
# INTERVAL LENGTH
# This is the seconds per unit interval as used in the
# host/contact/service configuration files. Setting this to 60
means
# that each interval is one minute long (60 seconds). Other
settings
# have not been tested much, so your mileage is likely to vary...
interval_length=TimeInSeconds
424
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
The possible errors while configuring Nagios Service Check Acceptor (NSCA) and Nagios Remote
Plug-in Executor (NRPE) and the troubleshooting steps are listed in this section.
If port 5667 is not opened on the server host's firewall, a timeout error is displayed. Ensure
that port 5667 is opened.
1. Log in as root and run the following command on the Red Hat Gluster Storage node to get
the list of current iptables rules:
# iptables -L
1. Run the following command on the Red Hat Gluster Storage node as root to get a listing of
the current firewall rules:
# firewall-cmd --list-all-zones
2. If the port is open, 5667/tcp is listed beside ports: under one or more zones in your
output.
If the port is not open, add a firewall rule for the port:
1. If the port is not open, add an iptables rule by adding the following line in
/etc/sysconfig/iptables file:
425
Administration Guide
Messages cannot be sent to the NSCA server, if Nagios server IP or FQDN, cluster name and
hostname (as configured in Nagios server) are not configured correctly.
Open the Nagios server configuration file /etc/nagios/nagios_server.conf and verify if the
correct configurations are set as shown below:
# NAGIOS SERVER
# The nagios server IP address or FQDN to which the NSCA command
# needs to be sent
[NAGIOS-SERVER]
nagios_server=NagiosServerIPAddress
# CLUSTER NAME
# The host name of the logical cluster configured in Nagios under
which
# the gluster volume services reside
[NAGIOS-DEFINTIONS]
cluster_name=cluster_auto
If Host name is updated, restart the NSCA service using the following command:
This error occurs if the IP address of the Nagios server is not defined in the nrpe.cfg file of
the Red Hat Gluster Storage node. To fix this issue, follow the steps given below:
allowed_hosts=127.0.0.1, NagiosServerIP
The allowed_hosts is the list of IP addresses which can execute NRPE commands.
2. Save the nrpe.cfg file and restart NRPE service using the following command:
426
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
On Nagios Server:
The default timeout value for the NRPE calls is 10 seconds and if the server does not respond
within 10 seconds, Nagios Server GUI displays an error that the NRPE call has timed out in 10
seconds. To fix this issue, change the timeout value for NRPE calls by modifying the command
definition configuration files.
1. Changing the NRPE timeout for services which directly invoke check_nrpe.
define command {
command_name check_disk_and_inode
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c
check_disk_and_inode -t TimeInSeconds
}
2. Changing the NRPE timeout for the services in nagios-server-addons package which
invoke NRPE call through code.
define command {
command_name check_vol_utilization
command_line $USER1$/gluster/check_vol_server.py $ARG1$
$ARG2$ -w $ARG3$ -c $ARG4$ -o utilization -t TimeInSeconds
}
The auto configuration service gluster_auto_discovery makes NRPE calls for the
configuration details from the Red Hat Gluster Storage nodes. To change the NRPE
timeout value for the auto configuration service, modify the command definition
configuration file /etc/nagios/gluster/gluster-commands.cfg by adding -t
TimeInSeconds as shown below:
define command{
command_name gluster_auto_discovery
command_line sudo $USER1$/gluster/configure-gluster-
nagios.py -H $ARG1$ -c $HOSTNAME$ -m auto -n $ARG2$ -t
TimeInSeconds
}
427
Administration Guide
1. Add the Nagios server IP address as described in CHECK_NRPE: Error - Could Not Complete
SSL Handshake section in Troubleshooting NRPE Configuration Issues section.
# vi /etc/nagios/nrpe.cfg
3. Search for the command_timeout and connection_timeout settings and change the
value. The command_timeout value must be greater than or equal to the timeout value
set in Nagios server.
This error occurs if the NRPE service is not running. To resolve this issue perform the steps
given below:
1. Log in as root to the Red Hat Gluster Storage node and run the following command to
verify the status of NRPE service:
2. If NRPE is not running, start the service using the following command:
This error is associated with firewalls and ports. The timeout error is displayed if the NRPE
traffic is not traversing a firewall, or if port 5666 is not open on the Red Hat Gluster Storage
node.
Ensure that port 5666 is open on the Red Hat Gluster Storage node.
1. Run check_nrpe command from the Nagios server to verify if the port is open and if
NRPE is running on the Red Hat Gluster Storage Node .
2. Log into the Nagios server as root and run the following command:
# /usr/lib64/nagios/plugins/check_nrpe -H RedHatStorageNodeIP
NRPE v2.14
428
CHAPTER 18. MONITORING RED HAT GLUSTER STORAGE
If not, ensure the that port 5666 is opened on the Red Hat Gluster Storage node.
1. Run the following command on the Red Hat Gluster Storage node as root to get a listing of
the current iptables rules:
# iptables -L
1. Run the following command on the Red Hat Gluster Storage node as root to get a listing of
the current firewall rules:
# firewall-cmd --list-all-zones
2. If the port is open, 5666/tcp is listed beside ports: under one or more zones in your
output.
If the port is not open, add an iptables rule for the port.
# vi /etc/sysconfig/iptables
429
Administration Guide
Use telnet to verify the Red Hat Gluster Storage node's ports. To verify the ports of the Red
Hat Gluster Storage node, perform the steps given below:
2. Test the connection on port 5666 from the Nagios server to the Red Hat Gluster Storage
node using the following command:
This error is due to port/firewall issues or incorrectly configured allowed_hosts directives. See
the sections CHECK_NRPE: Error - Could Not Complete SSL Handshakeand CHECK_NRPE: Socket
Timeout After n Seconds for troubleshooting steps.
430
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
You can use the volume top and volume profile commands to view vital performance
information and identify bottlenecks on each brick of a volume.
You can also perform a statedump of the brick processes and NFS server process of a volume, and also
view volume status and volume information.
NOTE
If you restart the server process, the existing profile and top information will be
reset.
IMPORTANT
Running profile command can affect system performance while the profile
information is being collected. Red Hat recommends that profiling should only be used
for debugging.
When profiling is started on the volume, the following additional options are displayed when using the
volume info command:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
431
Administration Guide
To view the I/O information of the bricks on a volume, use the following command:
------------------
------------------
Duration : 335
BytesRead : 94505058
BytesWritten : 195571980
To view the I/O information of the NFS server on a specified volume, use the following command:
432
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
For example, to view the I/O information of the NFS server on test-volume:
Interval 1 Stats:
Block Size: 32768b+ 65536b+
No. of Reads: 0 0
No. of Writes: 1000 1000
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.01 410.33 us 194.00 us 641.00 us 3
STATFS
0.60 465.44 us 346.00 us 867.00 us 147
FSTAT
1.63 187.21 us 67.00 us 6081.00 us 1000
SETATTR
1.94 221.40 us 58.00 us 55399.00 us 1002
ACCESS
433
Administration Guide
19.2.1. Viewing Open File Descriptor Count and Maximum File Descriptor Count
You can view the current open file descriptor count and the list of files that are currently being
accessed on the brick with the volume top command. The volume top command also displays the
maximum open file descriptor count of files that are currently open, and the maximum number of files
opened at any given point of time since the servers are up and running. If the brick name is not
specified, then the open file descriptor metrics of all the bricks belonging to the volume displays.
To view the open file descriptor count and the maximum file descriptor count, use the following
command:
# gluster volume top VOLNAME open [nfs | brick BRICK-NAME] [list-cnt cnt]
For example, to view the open file descriptor count and the maximum file descriptor count on brick
server:/export on test-volume, and list the top 10 open calls:
434
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
2 /clients/client0/~dmtmp/PARADOX/
COURSES.DB
11 /clients/client0/~dmtmp/PARADOX/
ENROLL.DB
11 /clients/client0/~dmtmp/PARADOX/
STUDENTS.DB
10 /clients/client0/~dmtmp/PWRPNT/
TIPS.PPT
10 /clients/client0/~dmtmp/PWRPNT/
PCBENCHM.PPT
9 /clients/client7/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client1/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client2/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client0/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client8/~dmtmp/PARADOX/
STUDENTS.DB
# gluster volume top VOLNAME read [nfs | brick BRICK-NAME] [list-cnt cnt]
For example, to view the highest read calls on brick server:/export of test-volume:
read filename
call count
116 /clients/client0/~dmtmp/SEED/LARGE.FIL
435
Administration Guide
64 /clients/client0/~dmtmp/SEED/MEDIUM.FIL
54 /clients/client2/~dmtmp/SEED/LARGE.FIL
54 /clients/client6/~dmtmp/SEED/LARGE.FIL
54 /clients/client5/~dmtmp/SEED/LARGE.FIL
54 /clients/client0/~dmtmp/SEED/LARGE.FIL
54 /clients/client3/~dmtmp/SEED/LARGE.FIL
54 /clients/client4/~dmtmp/SEED/LARGE.FIL
54 /clients/client9/~dmtmp/SEED/LARGE.FIL
54 /clients/client8/~dmtmp/SEED/LARGE.FIL
# gluster volume top VOLNAME write [nfs | brick BRICK-NAME] [list-cnt cnt]
For example, to view the highest write calls on brick server:/export of test-volume:
83 /clients/client0/~dmtmp/SEED/LARGE.FIL
59 /clients/client7/~dmtmp/SEED/LARGE.FIL
59 /clients/client1/~dmtmp/SEED/LARGE.FIL
59 /clients/client2/~dmtmp/SEED/LARGE.FIL
59 /clients/client0/~dmtmp/SEED/LARGE.FIL
59 /clients/client8/~dmtmp/SEED/LARGE.FIL
59 /clients/client5/~dmtmp/SEED/LARGE.FIL
59 /clients/client4/~dmtmp/SEED/LARGE.FIL
59 /clients/client6/~dmtmp/SEED/LARGE.FIL
59 /clients/client3/~dmtmp/SEED/LARGE.FIL
436
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
To view the highest open() calls on each directory, use the following command:
For example, to view the highest open calls on brick server:/export/ of test-volume:
1001 /clients/client0/~dmtmp
454 /clients/client8/~dmtmp
454 /clients/client2/~dmtmp
454 /clients/client6/~dmtmp
454 /clients/client5/~dmtmp
454 /clients/client9/~dmtmp
443 /clients/client0/~dmtmp/PARADOX
408 /clients/client1/~dmtmp
408 /clients/client7/~dmtmp
402 /clients/client4/~dmtmp
To view the highest directory read() calls on each brick, use the following command:
# gluster volume top VOLNAME readdir [nfs | brick BRICK-NAME] [list-cnt cnt]
For example, to view the highest directory read calls on brick server:/export/ of test-volume:
437
Administration Guide
1996 /clients/client0/~dmtmp
1083 /clients/client0/~dmtmp/PARADOX
904 /clients/client8/~dmtmp
904 /clients/client2/~dmtmp
904 /clients/client6/~dmtmp
904 /clients/client5/~dmtmp
904 /clients/client9/~dmtmp
812 /clients/client1/~dmtmp
812 /clients/client7/~dmtmp
800 /clients/client4/~dmtmp
This command initiates a read() call for the specified count and block size and measures the
corresponding throughput directly on the back-end export, bypassing glusterFS processes.
To view the read performance on each brick, use the command, specifying options as needed:
# gluster volume top VOLNAME read-perf [bs blk-size count count] [nfs | brick
BRICK-NAME] [list-cnt cnt]
For example, to view the read performance on brick server:/export/ of test-volume, specifying a
256 block size, and list the top 10 results:
438
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
This command initiates a write operation for the specified count and block size and measures the
corresponding throughput directly on back-end export, bypassing glusterFS processes.
To view the write performance on each brick, use the following command, specifying options as
needed:
# gluster volume top VOLNAME write-perf [bs blk-size count count] [nfs |
brick BRICK-NAME] [list-cnt cnt]
For example, to view the write performance on brick server:/export/ of test-volume, specifying a
256 block size, and list the top 10 results:
439
Administration Guide
The gstatus command provides an easy-to-use, high-level view of the health of a trusted storage
pool with a single command. By executing the glusterFS commands, it gathers information about the
statuses of the Red Hat Gluster Storage nodes, volumes, and bricks. The checks are performed across
the trusted storage pool and the status is displayed. This data can be analyzed to add further checks
and incorporate deployment best-practices and free-space triggers.
A Red Hat Gluster Storage volume is made from individual file systems (glusterFS bricks) across
multiple nodes. Although the complexity is abstracted, the status of the individual bricks affects the
data availability of the volume. For example, even without replication, the loss of a single brick in the
volume will not cause the volume itself to be unavailable, instead this would manifest as inaccessible
files in the file system.
19.3.1.1. Prerequisites
Package dependencies
To install gstatus, refer to the Deploying gstatus on Red Hat Gluster Storagechapter in the Red Hat
Gluster Storage 3.3 Installation Guide.
440
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
# gstatus -h
Usage: gstatus [options]
Option Description
-s, --state Displays the high level health of the Red Hat Gluster
Storage trusted storage pool.
Command Description
gstatus -vl VOLNAME View the volume details, including the brick layout.
441
Administration Guide
Command Description
gstatus -o <keyvalue> View the summary output for Nagios and Logstash.
Each invocation of gstatus provides a header section, which provides a high level view of the state of
the Red Hat Gluster Storage trusted storage pool. The Status field within the header offers two
states; Healthy and Unhealthy. When problems are detected, the status field changes to
Unhealthy(n), where n denotes the total number of issues that have been detected.
The following examples illustrate gstatus command output for both healthy and unhealthy Red Hat
Gluster Storage environments.
Example 19.1. Example 1: Trusted Storage Pool is in a healthy state; all nodes, volumes and
bricks are online
# gstatus -a
Nodes : 4/ 4 Volumes: 1 Up
Self Heal: 4/ 4 0 Up(Degraded)
Bricks : 4/ 4 0 Up(Partial)
Connections : 5 / 20 0 Down
Volume Information
splunk UP - 4/4 bricks up - Distributed-Replicate
Capacity: (18% used) 3.00 GiB/18.00 GiB (used/total)
Snapshots: 0
Self Heal: 4/ 4
Tasks Active: None
Protocols: glusterfs:on NFS:on SMB:off
Gluster Connectivty: 5 hosts, 20 tcp connections
Status Messages
- Cluster is HEALTHY, all_bricks checks successful
# gstatus -al
442
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
OverCommit: No Snapshots: 0
Nodes : 3/ 4 Volumes: 0 Up
Self Heal: 3/ 4 1 Up(Degraded)
Bricks : 3/ 4 0 Up(Partial)
Connections : 5/ 20 0 Down
Volume Information
splunk UP(DEGRADED) - 3/4 bricks up - Distributed-Replicate
Capacity: (18% used) 3.00 GiB/18.00 GiB (used/total)
Snapshots: 0
Self Heal: 3/ 4
Tasks Active: None
Protocols: glusterfs:on NFS:on SMB:off
Gluster Connectivty: 5 hosts, 20 tcp connections
splunk---------- +
|
Distribute (dht)
|
+-- Repl Set 0 (afr)
| |
| +--splunk-rhs1:/rhgs/brick1/splunk(UP)
2.00 GiB/9.00 GiB
| |
| +--splunk-rhs2:/rhgs/brick1/splunk(UP)
2.00 GiB/9.00 GiB
|
+-- Repl Set 1 (afr)
|
+--splunk-
rhs3:/rhgs/brick1/splunk(DOWN) 0.00 KiB/0.00 KiB
|
+--splunk-rhs4:/rhgs/brick1/splunk(UP)
2.00 GiB/9.00 GiB
Status Messages
- Cluster is UNHEALTHY
- One of the nodes in the cluster is down
- Brick splunk-rhs3:/rhgs/brick1/splunk in volume 'splunk' is
down/unavailable
- INFO -> Not all bricks are online, so capacity provided is NOT
accurate
Example 2, displays the output of the command when the -l option is used. The brick layout mode
shows the brick and node relationships. This provides a simple means of checking the replication
relationships for bricks across nodes is as intended.
Field Description
Volume State Up – The volume is started and available, and all the
bricks are up .
443
Administration Guide
Field Description
Over-commit Status The physical file system used by a brick could be re-
used by multiple volumes, this field indicates
whether a brick is used by multiple volumes. But this
exposes the system to capacity conflicts across
different volumes when the quota feature is not in
use. Reusing a brick for multiple volumes is not
recommended.
444
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Field Description
445
Administration Guide
Using the command line interface, external applications can invoke the command on all nodes of the
trusted storage pool, and parse and collate the data obtained from all these nodes to get an easy-to-
use and complete picture of the state of the trusted storage pool in a machine parseable format.
Command Description
446
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Section Description
Global options Displays cluster specific options that have been set
explicitly through the volume set command.
# gluster get-state
glusterd state dumped to /var/run/gluster/glusterd_state_timestamp
[Global]
MYUUID: 1e20ed87-c22a-4612-ab04-90765bccaea5
op-version: 40000
[Global options]
cluster.server-quorum-ratio: 60
[Peers]
Peer1.primary_hostname: output omitted
Peer1.uuid: dfc7ff96-b61d-4c88-a3ad-b6852f72c5f0
Peer1.state: Peer in Cluster
Peer1.connected: Connected
Peer1.othernames:
Peer2.primary_hostname: output omitted
Peer2.uuid: dd83409e-22fa-4186-935a-648a1927cc9d
Peer2.state: Peer in Cluster
Peer2.connected: Connected
Peer2.othernames:
[Volumes]
Volume1.name: tv1
Volume1.id: cf89d345-8cde-4c53-be85-1f3f20e7e410
447
Administration Guide
Volume1.type: Distribute
Volume1.transport_type: tcp
Volume1.status: Started
Volume1.brickcount: 3
Volume1.Brick1.path: output omitted:/root/bricks/tb11
Volume1.Brick1.hostname: output omitted
Volume1.Brick1.port: 49152
Volume1.Brick1.rdma_port: 0
Volume1.Brick1.status: Started
Volume1.Brick1.signedin: True
Volume1.Brick2.path: output omitted:/root/bricks/tb12
Volume1.Brick2.hostname: output omitted
Volume1.Brick3.path: output omitted:/root/bricks/tb13
Volume1.Brick3.hostname: output omitted
Volume1.snap_count: 0
Volume1.stripe_count: 1
Volume1.replica_count: 1
Volume1.subvol_count: 3
Volume1.arbiter_count: 0
Volume1.disperse_count: 0
Volume1.redundancy_count: 0
Volume1.quorum_status: not_applicable
Volume1.snapd_svc.online_status: Online
Volume1.snapd_svc.inited: True
Volume1.rebalance.id: 00000000-0000-0000-0000-000000000000
Volume1.rebalance.status: not_started
Volume1.rebalance.failures: 0
Volume1.rebalance.skipped: 0
Volume1.rebalance.lookedup: 0
Volume1.rebalance.files: 0
Volume1.rebalance.data: 0Bytes
[Volume1.options]
features.uss: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
Volume2.name: tv2
Volume2.id: 700fd588-6fc2-46d5-9435-39c434656fe2
Volume2.type: Distribute
Volume2.transport_type: tcp
Volume2.status: Created
Volume2.brickcount: 3
Volume2.Brick1.path: output omitted:/root/bricks/tb21
Volume2.Brick1.hostname: output omitted
Volume2.Brick1.port: 0
Volume2.Brick1.rdma_port: 0
Volume2.Brick1.status: Stopped
Volume2.Brick1.signedin: False
Volume2.Brick2.path: output omitted:/root/bricks/tb22
Volume2.Brick2.hostname: output omitted
Volume2.Brick3.path: output omitted:/root/bricks/tb23
Volume2.Brick3.hostname: output omitted
Volume2.snap_count: 0
Volume2.stripe_count: 1
448
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Volume2.replica_count: 1
Volume2.subvol_count: 3
Volume2.arbiter_count: 0
Volume2.disperse_count: 0
Volume2.redundancy_count: 0
Volume2.quorum_status: not_applicable
Volume2.snapd_svc.online_status: Offline
Volume2.snapd_svc.inited: False
Volume2.rebalance.id: 00000000-0000-0000-0000-000000000000
Volume2.rebalance.status: not_started
Volume2.rebalance.failures: 0
Volume2.rebalance.skipped: 0
Volume2.rebalance.lookedup: 0
Volume2.rebalance.files: 0
Volume2.rebalance.data: 0Bytes
[Volume2.options]
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
Volume3.name: tv3
Volume3.id: 97b94d77-116a-4595-acfc-9676e4ebcbd2
Volume3.type: Tier
Volume3.transport_type: tcp
Volume3.status: Stopped
Volume3.brickcount: 4
Volume3.Brick1.path: output omitted:/root/bricks/tb34
Volume3.Brick1.hostname: output omitted
Volume3.Brick2.path: output omitted:/root/bricks/tb33
Volume3.Brick2.hostname: output omitted
Volume3.Brick3.path: output omitted:/root/bricks/tb31
Volume3.Brick3.hostname: output omitted
Volume3.Brick3.port: 49154
Volume3.Brick3.rdma_port: 0
Volume3.Brick3.status: Stopped
Volume3.Brick3.signedin: False
Volume3.Brick3.tier: Cold
Volume3.Brick4.path: output omitted:/root/bricks/tb32
Volume3.Brick4.hostname: output omitted
Volume3.snap_count: 0
Volume3.stripe_count: 1
Volume3.replica_count: 2
Volume3.subvol_count: 2
Volume3.arbiter_count: 0
Volume3.disperse_count: 0
Volume3.redundancy_count: 0
Volume3.quorum_status: not_applicable
Volume3.snapd_svc.online_status: Offline
Volume3.snapd_svc.inited: True
Volume3.rebalance.id: 00000000-0000-0000-0000-000000000000
Volume3.rebalance.status: not_started
Volume3.rebalance.failures: 0
Volume3.rebalance.skipped: 0
Volume3.rebalance.lookedup: 0
Volume3.rebalance.files: 0
449
Administration Guide
Volume3.rebalance.data: 0Bytes
Volume3.tier_info.cold_tier_type: Replicate
Volume3.tier_info.cold_brick_count: 2
Volume3.tier_info.cold_replica_count: 2
Volume3.tier_info.cold_disperse_count: 0
Volume3.tier_info.cold_dist_leaf_count: 2
Volume3.tier_info.cold_redundancy_count: 0
Volume3.tier_info.hot_tier_type: Replicate
Volume3.tier_info.hot_brick_count: 2
Volume3.tier_info.hot_replica_count: 2
Volume3.tier_info.promoted: 0
Volume3.tier_info.demoted: 0
[Volume3.options]
cluster.tier-mode: cache
features.ctr-enabled: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
Volume4.name: tv4
Volume4.id: ad7260ac-0d5c-461f-a39c-a0f4a4ff854b
Volume4.type: Distribute
Volume4.transport_type: tcp
Volume4.status: Started
Volume4.brickcount: 2
Volume4.Brick1.path: output omitted:/root/bricks/tb41
Volume4.Brick1.hostname: output omitted
Volume4.Brick2.path: output omitted:/root/bricks/tb42
Volume4.Brick2.hostname: output omitted
Volume4.snapshot1.name: tv4-snap_GMT-2016.11.24-12.10.15
Volume4.snapshot1.id: 2eea76ae-c99f-4128-b5c0-3233048312f2
Volume4.snapshot1.time: 2016-11-24 12:10:15
Volume4.snapshot1.status: in_use
Volume4.snap_count: 1
Volume4.stripe_count: 1
Volume4.subvol_count: 2
Volume4.arbiter_count: 0
Volume4.disperse_count: 0
Volume4.redundancy_count: 0
Volume4.quorum_status: not_applicable
Volume4.snapd_svc.online_status: Offline
Volume4.snapd_svc.inited: True
Volume4.rebalance.id: 00000000-0000-0000-0000-000000000000
Volume4.rebalance.status: not_started
Volume4.rebalance.failures: 0
Volume4.rebalance.skipped: 0
Volume4.rebalance.lookedup: 0
Volume4.rebalance.files: 0
Volume4.rebalance.data: 0
Volume4.rebalance.data: 0
[Volume4.options]
features.uss: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
450
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
[Services]
svc1.name: glustershd
svc1.online_status: Offline
svc2.name: nfs
svc2.online_status: Offline
svc3.name: bitd
svc3.online_status: Offline
svc4.name: scrub
svc4.online_status: Offline
svc5.name: quotad
svc5.online_status: Offline
[Misc]
Base port: 49152
Last allocated port: 49154
Invocation of the gluster get-state volumeoptions lists all volume options irrespective of
whether the volume option has been explicitly set or not.
[Volume Options]
Volume1.name: Distvol
Volume1.options.count: 309
Volume1.options.value309: 0
Volume1.options.key309: cluster.max-bricks-per-process
Volume1.options.value308: off
Volume1.options.key308: cluster.brick-multiplex
Volume1.options.value307: on
Volume1.options.key307: disperse.optimistic-change-log
Volume1.options.value306: 60
Volume1.options.key306: performance.nl-cache-timeout
Volume1.options.value305: 10MB
Volume1.options.key305: performance.nl-cache-limit
Volume1.options.value304: false
Volume1.options.key304: performance.nl-cache-positive-entry
Volume1.options.value303: 10MB
Volume1.options.key303: performance.rda-cache-limit
Volume1.options.value302: 128KB
451
Administration Guide
Volume1.options.key302: performance.rda-high-wmark
Volume1.options.value301: 4096
Volume1.options.key301: performance.rda-low-wmark
Volume1.options.value300: 131072
Volume1.options.key300: performance.rda-request-size
Volume1.options.value299: off
Volume1.options.key299: performance.parallel-readdir
Volume1.options.value298: off
Volume1.options.key298: cluster.use-compound-fops
Volume1.options.value297: 1024
Volume1.options.key297: disperse.shd-wait-qlength
Volume1.options.value296: 1
Volume1.options.key296: disperse.shd-max-threads
Volume1.options.value295: off
Volume1.options.key295: cluster.use-compound-fops
Volume1.options.value294: no
Volume1.options.key294: cluster.granular-entry-heal
Volume1.options.value293: full
Volume1.options.key293: cluster.locking-scheme
Volume1.options.value292: 1024
Volume1.options.key292: cluster.shd-wait-qlength
Volume1.options.value291: 1
Volume1.options.key291: cluster.shd-max-threads
Volume1.options.value290: round-robin
Volume1.options.key290: disperse.read-policy
Volume1.options.value289: on
Volume1.options.key289: dht.force-readdirp
Volume1.options.value288: 600
Volume1.options.key288: cluster.heal-timeout
Volume1.options.value287: 128
Volume1.options.key287: disperse.heal-wait-qlength
Volume1.options.value286: 8
Volume1.options.key286: disperse.background-heals
Volume1.options.value285: 60
Volume1.options.key285: features.lease-lock-recall-timeout
Volume1.options.value284: off
Volume1.options.key284: features.leases
Volume1.options.value283: 60
Volume1.options.key283: features.cache-invalidation-timeout
Volume1.options.value282: off
Volume1.options.key282: features.cache-invalidation
Volume1.options.value281: 120
Volume1.options.key281: features.expiry-time
Volume1.options.value280: false
Volume1.options.key280: features.scrub
Volume1.options.value279: biweekly
Volume1.options.key279: features.scrub-freq
Volume1.options.value278: lazy
Volume1.options.key278: features.scrub-throttle
Volume1.options.value277: 64MB
Volume1.options.key277: features.shard-block-size
Volume1.options.value276: off
Volume1.options.key276: features.shard
Volume1.options.value275: off
Volume1.options.key275: ganesha.enable
Volume1.options.value274: (null)
452
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Volume1.options.key274: client.bind-insecure
Volume1.options.value273: no
Volume1.options.key273: cluster.quorum-reads
Volume1.options.value272: enable
Volume1.options.key272: cluster.disperse-self-heal-daemon
Volume1.options.value271: off
Volume1.options.key271: locks.mandatory-locking
Volume1.options.value270: off
Volume1.options.key270: locks.trace
Volume1.options.value269: 25000
Volume1.options.key269: features.ctr-sql-db-wal-autocheckpoint
Volume1.options.value268: 12500
Volume1.options.key268: features.ctr-sql-db-cachesize
Volume1.options.value267: 300
Volume1.options.key267: features.ctr_lookupheal_inode_timeout
Volume1.options.value266: 300
Volume1.options.key266: features.ctr_lookupheal_link_timeout
Volume1.options.value265: off
Volume1.options.key265: features.ctr_link_consistency
Volume1.options.value264: off
Volume1.options.key264: features.ctr-record-metadata-heat
Volume1.options.value263: off
Volume1.options.key263: features.record-counters
Volume1.options.value262: off
Volume1.options.key262: features.ctr-enabled
Volume1.options.value261: 100
Volume1.options.key261: cluster.tier-query-limit
Volume1.options.value260: 10000
Volume1.options.key260: cluster.tier-max-files
Volume1.options.value259: 4000
Volume1.options.key259: cluster.tier-max-mb
Volume1.options.value258: 0
Volume1.options.key258: cluster.tier-max-promote-file-size
Volume1.options.value257: cache
Volume1.options.key257: cluster.tier-mode
Volume1.options.value256: 75
Volume1.options.key256: cluster.watermark-low
Volume1.options.value255: 90
Volume1.options.key255: cluster.watermark-hi
Volume1.options.value254: 3600
Volume1.options.key254: cluster.tier-demote-frequency
Volume1.options.value253: 120
Volume1.options.key253: cluster.tier-promote-frequency
Volume1.options.value252: off
Volume1.options.key252: cluster.tier-pause
Volume1.options.value251: 0
Volume1.options.key251: cluster.read-freq-threshold
Volume1.options.value250: 0
Volume1.options.key250: cluster.write-freq-threshold
Volume1.options.value249: disable
Volume1.options.key249: cluster.enable-shared-storage
Volume1.options.value248: off
Volume1.options.key248: features.trash-internal-op
Volume1.options.value247: 5MB
Volume1.options.key247: features.trash-max-filesize
Volume1.options.value246: (null)
453
Administration Guide
Volume1.options.key246: features.trash-eliminate-path
Volume1.options.value245: .trashcan
Volume1.options.key245: features.trash-dir
Volume1.options.value244: off
Volume1.options.key244: features.trash
Volume1.options.value243: 120
Volume1.options.key243: features.barrier-timeout
Volume1.options.value242: disable
Volume1.options.key242: features.barrier
Volume1.options.value241: off
Volume1.options.key241: changelog.capture-del-path
Volume1.options.value240: 120
Volume1.options.key240: changelog.changelog-barrier-timeout
Volume1.options.value239: 5
Volume1.options.key239: changelog.fsync-interval
Volume1.options.value238: 15
Volume1.options.key238: changelog.rollover-time
Volume1.options.value237: ascii
Volume1.options.key237: changelog.encoding
Volume1.options.value236: (null)
Volume1.options.key236: changelog.changelog-dir
Volume1.options.value235: off
Volume1.options.key235: changelog.changelog
Volume1.options.value234: 0
Volume1.options.key234: cluster.server-quorum-ratio
Volume1.options.value233: off
Volume1.options.key233: cluster.server-quorum-type
Volume1.options.value232: off
Volume1.options.key232: storage.bd-aio
Volume1.options.value231: off
Volume1.options.key231: storage.build-pgfid
Volume1.options.value230: 30
Volume1.options.key230: storage.health-check-interval
Volume1.options.value229: off
Volume1.options.key229: storage.node-uuid-pathinfo
Volume1.options.value228: -1
Volume1.options.key228: storage.owner-gid
Volume1.options.value227: -1
Volume1.options.key227: storage.owner-uid
Volume1.options.value226: 0
Volume1.options.key226: storage.batch-fsync-delay-usec
Volume1.options.value225: reverse-fsync
Volume1.options.key225: storage.batch-fsync-mode
Volume1.options.value224: off
Volume1.options.key224: storage.linux-aio
Volume1.options.value223: 180
Volume1.options.key223: features.auto-commit-period
Volume1.options.value222: relax
Volume1.options.key222: features.retention-mode
Volume1.options.value221: 120
Volume1.options.key221: features.default-retention-period
Volume1.options.value220: off
Volume1.options.key220: features.worm-file-level
Volume1.options.value219: off
Volume1.options.key219: features.worm
Volume1.options.value218: off
454
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Volume1.options.key218: features.read-only
Volume1.options.value217: (null)
Volume1.options.key217: nfs.auth-cache-ttl-sec
Volume1.options.value216: (null)
Volume1.options.key216: nfs.auth-refresh-interval-sec
Volume1.options.value215: (null)
Volume1.options.key215: nfs.exports-auth-enable
Volume1.options.value214: on
Volume1.options.key214: nfs.rdirplus
Volume1.options.value213: (1 * 1048576ULL)
Volume1.options.key213: nfs.readdir-size
Volume1.options.value212: (1 * 1048576ULL)
Volume1.options.key212: nfs.write-size
Volume1.options.value211: (1 * 1048576ULL)
Volume1.options.key211: nfs.read-size
Volume1.options.value210: 0x20000
Volume1.options.key210: nfs.drc-size
Volume1.options.value209: off
Volume1.options.key209: nfs.drc
Volume1.options.value208: off
Volume1.options.key208: nfs.server-aux-gids
Volume1.options.value207: /sbin/rpc.statd
Volume1.options.key207: nfs.rpc-statd
Volume1.options.value206: /var/lib/glusterd/nfs/rmtab
Volume1.options.key206: nfs.mount-rmtab
Volume1.options.value205: off
Volume1.options.key205: nfs.mount-udp
Volume1.options.value204: ON
Volume1.options.key204: nfs.acl
Volume1.options.value203: on
Volume1.options.key203: nfs.nlm
Volume1.options.value202: ON
Volume1.options.key202: nfs.disable
Volume1.options.value201:
Volume1.options.key201: nfs.export-dir
Volume1.options.value200: read-write
Volume1.options.key200: nfs.volume-access
Volume1.options.value199: off
Volume1.options.key199: nfs.trusted-write
Volume1.options.value198: off
Volume1.options.key198: nfs.trusted-sync
Volume1.options.value197: off
Volume1.options.key197: nfs.ports-insecure
Volume1.options.value196: none
Volume1.options.key196: nfs.rpc-auth-reject
Volume1.options.value195: all
Volume1.options.key195: nfs.rpc-auth-allow
Volume1.options.value194: on
Volume1.options.key194: nfs.rpc-auth-null
Volume1.options.value193: on
Volume1.options.key193: nfs.rpc-auth-unix
Volume1.options.value192: 2049
Volume1.options.key192: nfs.port
Volume1.options.value191: 16
Volume1.options.key191: nfs.outstanding-rpc-limit
Volume1.options.value190: on
455
Administration Guide
Volume1.options.key190: nfs.register-with-portmap
Volume1.options.value189: off
Volume1.options.key189: nfs.dynamic-volumes
Volume1.options.value188: off
Volume1.options.key188: nfs.addr-namelookup
Volume1.options.value187: on
Volume1.options.key187: nfs.export-volumes
Volume1.options.value186: on
Volume1.options.key186: nfs.export-dirs
Volume1.options.value185: 15
Volume1.options.key185: nfs.mem-factor
Volume1.options.value184: ON
Volume1.options.key184: nfs.enable-ino32
Volume1.options.value183: (null)
Volume1.options.key183: debug.error-fops
Volume1.options.value182: off
Volume1.options.key182: debug.random-failure
Volume1.options.value181: (null)
Volume1.options.key181: debug.error-number
Volume1.options.value180: (null)
Volume1.options.key180: debug.error-failure
Volume1.options.value179: off
Volume1.options.key179: debug.error-gen
Volume1.options.value178: (null)
Volume1.options.key178: debug.include-ops
Volume1.options.value177: (null)
Volume1.options.key177: debug.exclude-ops
Volume1.options.value176: no
Volume1.options.key176: debug.log-file
Volume1.options.value175: no
Volume1.options.key175: debug.log-history
Volume1.options.value174: off
Volume1.options.key174: debug.trace
Volume1.options.value173: disable
Volume1.options.key173: features.bitrot
Volume1.options.value172: off
Volume1.options.key172: features.inode-quota
Volume1.options.value171: off
Volume1.options.key171: features.quota
Volume1.options.value170: off
Volume1.options.key170: geo-replication.ignore-pid-check
Volume1.options.value169: off
Volume1.options.key169: geo-replication.ignore-pid-check
Volume1.options.value168: off
Volume1.options.key168: geo-replication.indexing
Volume1.options.value167: off
Volume1.options.key167: geo-replication.indexing
Volume1.options.value166: off
Volume1.options.key166: features.quota-deem-statfs
Volume1.options.value165: 86400
Volume1.options.key165: features.alert-time
Volume1.options.value164: 5
Volume1.options.key164: features.hard-timeout
Volume1.options.value163: 60
Volume1.options.key163: features.soft-timeout
Volume1.options.value162: 80%
456
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Volume1.options.key162: features.default-soft-limit
Volume1.options.value161: 0
Volume1.options.key161: features.quota-timeout
Volume1.options.value160: (null)
Volume1.options.key160: features.limit-usage
Volume1.options.value159: false
Volume1.options.key159: network.compression.debug
Volume1.options.value158: -1
Volume1.options.key158: network.compression.compression-level
Volume1.options.value157: 0
Volume1.options.key157: network.compression.min-size
Volume1.options.value156: 8
Volume1.options.key156: network.compression.mem-level
Volume1.options.value155: -15
Volume1.options.key155: network.compression.window-size
Volume1.options.value154: off
Volume1.options.key154: network.compression
Volume1.options.value153: off
Volume1.options.key153: features.show-snapshot-directory
Volume1.options.value152: .snaps
Volume1.options.key152: features.snapshot-directory
Volume1.options.value151: off
Volume1.options.key151: features.uss
Volume1.options.value150: false
Volume1.options.key150: performance.cache-invalidation
Volume1.options.value149: true
Volume1.options.key149: performance.force-readdirp
Volume1.options.value148: off
Volume1.options.key148: performance.nfs.io-threads
Volume1.options.value147: off
Volume1.options.key147: performance.nfs.stat-prefetch
Volume1.options.value146: off
Volume1.options.key146: performance.nfs.quick-read
Volume1.options.value145: off
Volume1.options.key145: performance.nfs.io-cache
Volume1.options.value144: off
Volume1.options.key144: performance.nfs.read-ahead
Volume1.options.value143: on
Volume1.options.key143: performance.nfs.write-behind
Volume1.options.value142: on
Volume1.options.key142: performance.client-io-threads
Volume1.options.value141: on
Volume1.options.key141: performance.stat-prefetch
Volume1.options.value140: off
Volume1.options.key140: performance.nl-cache
Volume1.options.value139: on
Volume1.options.key139: performance.open-behind
Volume1.options.value138: on
Volume1.options.key138: performance.quick-read
Volume1.options.value137: on
Volume1.options.key137: performance.io-cache
Volume1.options.value136: on
Volume1.options.key136: performance.readdir-ahead
Volume1.options.value135: on
Volume1.options.key135: performance.read-ahead
Volume1.options.value134: on
457
Administration Guide
Volume1.options.key134: performance.write-behind
Volume1.options.value133: 10
Volume1.options.key133: transport.listen-backlog
Volume1.options.value132: inet
Volume1.options.key132: transport.address-family
Volume1.options.value131: (null)
Volume1.options.key131: ssl.ec-curve
Volume1.options.value130: (null)
Volume1.options.key130: ssl.dh-param
Volume1.options.value129: (null)
Volume1.options.key129: ssl.cipher-list
Volume1.options.value128: (null)
Volume1.options.key128: ssl.certificate-depth
Volume1.options.value127: (null)
Volume1.options.key127: ssl.crl-path
Volume1.options.value126: (null)
Volume1.options.key126: ssl.ca-list
Volume1.options.value125: (null)
Volume1.options.key125: ssl.private-key
Volume1.options.value124: (null)
Volume1.options.key124: ssl.own-cert
Volume1.options.value123: 1
Volume1.options.key123: server.event-threads
Volume1.options.value122: (null)
Volume1.options.key122: server.own-thread
Volume1.options.value121: 300
Volume1.options.key121: server.gid-timeout
Volume1.options.value120: on
Volume1.options.key120: client.send-gids
Volume1.options.value119: on
Volume1.options.key119: server.dynamic-auth
Volume1.options.value118: off
Volume1.options.key118: server.manage-gids
Volume1.options.value117: *
Volume1.options.key117: auth.ssl-allow
Volume1.options.value116: (null)
Volume1.options.key116: server.ssl
Volume1.options.value115: 10
Volume1.options.key115: features.grace-timeout
Volume1.options.value114: off
Volume1.options.key114: features.lock-heal
Volume1.options.value113: 64
Volume1.options.key113: server.outstanding-rpc-limit
Volume1.options.value112: /var/run/gluster
Volume1.options.key112: server.statedump-path
Volume1.options.value111: 65534
Volume1.options.key111: server.anongid
Volume1.options.value110: 65534
Volume1.options.key110: server.anonuid
Volume1.options.value109: off
Volume1.options.key109: server.root-squash
Volume1.options.value108: (null)
Volume1.options.key108: server.allow-insecure
Volume1.options.value107: (null)
Volume1.options.key107: transport.keepalive
Volume1.options.value106: (null)
458
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Volume1.options.key106: auth.reject
Volume1.options.value105: *
Volume1.options.key105: auth.allow
Volume1.options.value104: 16384
Volume1.options.key104: network.inode-lru-limit
Volume1.options.value103: (null)
Volume1.options.key103: network.tcp-window-size
Volume1.options.value102: 42
Volume1.options.key102: network.ping-timeout
Volume1.options.value101: 2
Volume1.options.key101: client.event-threads
Volume1.options.value100: disable
Volume1.options.key100: network.remote-dio
Volume1.options.value99: 10
Volume1.options.key99: features.grace-timeout
Volume1.options.value98: off
Volume1.options.key98: features.lock-heal
Volume1.options.value97: (null)
Volume1.options.key97: network.tcp-window-size
Volume1.options.value96: 42
Volume1.options.key96: network.ping-timeout
Volume1.options.value95: 1800
Volume1.options.key95: network.frame-timeout
Volume1.options.value94: 4096
Volume1.options.key94: encryption.block-size
Volume1.options.value93: 256
Volume1.options.key93: encryption.data-key-size
Volume1.options.value92: (null)
Volume1.options.key92: encryption.master-key
Volume1.options.value91: off
Volume1.options.key91: features.encryption
Volume1.options.value90: false
Volume1.options.key90: performance.cache-samba-metadata
Volume1.options.value89: true
Volume1.options.key89: performance.cache-swift-metadata
Volume1.options.value88: 1
Volume1.options.key88: performance.md-cache-timeout
Volume1.options.value87: 4
Volume1.options.key87: performance.read-ahead-page-count
Volume1.options.value86: no
Volume1.options.key86: performance.read-after-open
Volume1.options.value85: yes
Volume1.options.key85: performance.lazy-open
Volume1.options.value84: off
Volume1.options.key84: performance.nfs.strict-write-ordering
Volume1.options.value83: off
Volume1.options.key83: performance.strict-write-ordering
Volume1.options.value82: off
Volume1.options.key82: performance.nfs.strict-o-direct
Volume1.options.value81: off
Volume1.options.key81: performance.strict-o-direct
Volume1.options.value80: 1MB
Volume1.options.key80: performance.nfs.write-behind-window-size
Volume1.options.value79: off
Volume1.options.key79: performance.resync-failed-syncs-after-fsync
Volume1.options.value78: 1MB
459
Administration Guide
Volume1.options.key78: performance.write-behind-window-size
Volume1.options.value77: on
Volume1.options.key77: performance.nfs.flush-behind
Volume1.options.value76: on
Volume1.options.key76: performance.flush-behind
Volume1.options.value75: 128MB
Volume1.options.key75: performance.cache-size
Volume1.options.value74: 0
Volume1.options.key74: performance.least-rate-limit
Volume1.options.value73: on
Volume1.options.key73: performance.enable-least-priority
Volume1.options.value72: 1
Volume1.options.key72: performance.least-prio-threads
Volume1.options.value71: 16
Volume1.options.key71: performance.low-prio-threads
Volume1.options.value70: 16
Volume1.options.key70: performance.normal-prio-threads
Volume1.options.value69: 16
Volume1.options.key69: performance.high-prio-threads
Volume1.options.value68: 16
Volume1.options.key68: performance.io-thread-count
Volume1.options.value67: 32MB
Volume1.options.key67: performance.cache-size
Volume1.options.value66:
Volume1.options.key66: performance.cache-priority
Volume1.options.value65: 1
Volume1.options.key65: performance.cache-refresh-timeout
Volume1.options.value64: 0
Volume1.options.key64: performance.cache-min-file-size
Volume1.options.value63: 0
Volume1.options.key63: performance.cache-max-file-size
Volume1.options.value62: 86400
Volume1.options.key62: diagnostics.stats-dnscache-ttl-sec
Volume1.options.value61: 65535
Volume1.options.key61: diagnostics.fop-sample-buf-size
Volume1.options.value60: 0
Volume1.options.key60: diagnostics.fop-sample-interval
Volume1.options.value59: 0
Volume1.options.key59: diagnostics.stats-dump-interval
Volume1.options.value58: 120
Volume1.options.key58: diagnostics.client-log-flush-timeout
Volume1.options.value57: 120
Volume1.options.key57: diagnostics.brick-log-flush-timeout
Volume1.options.value56: 5
Volume1.options.key56: diagnostics.client-log-buf-size
Volume1.options.value55: 5
Volume1.options.key55: diagnostics.brick-log-buf-size
Volume1.options.value54: (null)
Volume1.options.key54: diagnostics.client-log-format
Volume1.options.value53: (null)
Volume1.options.key53: diagnostics.brick-log-format
Volume1.options.value52: (null)
Volume1.options.key52: diagnostics.client-logger
Volume1.options.value51: (null)
Volume1.options.key51: diagnostics.brick-logger
Volume1.options.value50: CRITICAL
460
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Volume1.options.key50: diagnostics.client-sys-log-level
Volume1.options.value49: CRITICAL
Volume1.options.key49: diagnostics.brick-sys-log-level
Volume1.options.value48: INFO
Volume1.options.key48: diagnostics.client-log-level
Volume1.options.value47: INFO
Volume1.options.key47: diagnostics.brick-log-level
Volume1.options.value46: off
Volume1.options.key46: diagnostics.count-fop-hits
Volume1.options.value45: off
Volume1.options.key45: diagnostics.dump-fd-stats
Volume1.options.value44: off
Volume1.options.key44: diagnostics.latency-measurement
Volume1.options.value43: true
Volume1.options.key43: cluster.stripe-coalesce
Volume1.options.value42: 128KB
Volume1.options.key42: cluster.stripe-block-size
Volume1.options.value41: none
Volume1.options.key41: cluster.favorite-child-policy
Volume1.options.value40: 128
Volume1.options.key40: cluster.heal-wait-queue-length
Volume1.options.value39: no
Volume1.options.key39: cluster.consistent-metadata
Volume1.options.value38: on
Volume1.options.key38: cluster.ensure-durability
Volume1.options.value37: 1
Volume1.options.key37: cluster.post-op-delay-secs
Volume1.options.value36: 1KB
Volume1.options.key36: cluster.self-heal-readdir-size
Volume1.options.value35: true
Volume1.options.key35: cluster.choose-local
Volume1.options.value34: (null)
Volume1.options.key34: cluster.quorum-count
Volume1.options.value33: none
Volume1.options.key33: cluster.quorum-type
Volume1.options.value32: on
Volume1.options.key32: disperse.eager-lock
Volume1.options.value31: on
Volume1.options.key31: cluster.eager-lock
Volume1.options.value30: (null)
Volume1.options.key30: cluster.data-self-heal-algorithm
Volume1.options.value29: on
Volume1.options.key29: cluster.metadata-change-log
Volume1.options.value28: on
Volume1.options.key28: cluster.data-change-log
Volume1.options.value27: 1
Volume1.options.key27: cluster.self-heal-window-size
Volume1.options.value26: 600
Volume1.options.key26: cluster.heal-timeout
Volume1.options.value25: on
Volume1.options.key25: cluster.self-heal-daemon
Volume1.options.value24: on
Volume1.options.key24: cluster.entry-self-heal
Volume1.options.value23: on
Volume1.options.key23: cluster.data-self-heal
Volume1.options.value22: on
461
Administration Guide
Volume1.options.key22: cluster.metadata-self-heal
Volume1.options.value21: 8
Volume1.options.key21: cluster.background-self-heal-count
Volume1.options.value20: 1
Volume1.options.key20: cluster.read-hash-mode
Volume1.options.value19: -1
Volume1.options.key19: cluster.read-subvolume-index
Volume1.options.value18: (null)
Volume1.options.key18: cluster.read-subvolume
Volume1.options.value17: on
Volume1.options.key17: cluster.entry-change-log
Volume1.options.value16: (null)
Volume1.options.key16: cluster.switch-pattern
Volume1.options.value15: on
Volume1.options.key15: cluster.weighted-rebalance
Volume1.options.value14: (null)
Volume1.options.key14: cluster.local-volume-name
Volume1.options.value13: off
Volume1.options.key13: cluster.lock-migration
Volume1.options.value12: normal
Volume1.options.key12: cluster.rebal-throttle
Volume1.options.value11: off
Volume1.options.key11: cluster.randomize-hash-range-by-gfid
Volume1.options.value10: trusted.glusterfs.dht
Volume1.options.key10: cluster.dht-xattr-name
Volume1.options.value9: (null)
Volume1.options.key9: cluster.extra-hash-regex
Volume1.options.value8: (null)
Volume1.options.key8: cluster.rsync-hash-regex
Volume1.options.value7: off
Volume1.options.key7: cluster.readdir-optimize
Volume1.options.value6: (null)
Volume1.options.key6: cluster.subvols-per-directory
Volume1.options.value5: off
Volume1.options.key5: cluster.rebalance-stats
Volume1.options.value4: 5%
Volume1.options.key4: cluster.min-free-inodes
Volume1.options.value3: 10%
Volume1.options.key3: cluster.min-free-disk
Volume1.options.value2: off
Volume1.options.key2: cluster.lookup-optimize
Volume1.options.value1: on
Volume1.options.key1: cluster.lookup-unhashed
output truncated
462
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Where,
For example:
Where,
For example:
For example:
Option Value
463
Administration Guide
------ -----
cluster.server-quorum-ratio 51
cluster.enable-shared-storage disable
cluster.op-version 31101
cluster.brick-multiplex disable
cluster.max-bricks-per-process 0
all
Dumps all available state information.
mem
Dumps the memory usage and memory pool details of the bricks.
iobuf
Dumps iobuf details of the bricks.
priv
Dumps private information of loaded translators.
callpool
Dumps the pending calls of the volume.
fd
Dumps the open file descriptor tables of the volume.
inode
Dumps the inode tables of the volume.
history
Dumps the event history of the volume
For example, to write out all available information about the data volume, run the following command
on the server:
464
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
If you only want to see details about the event history, run the following:
The nfs parameter is required to gather details about volumes shared via NFS. It can be combined
with any of the above parameters to filter output.
The quotad parameter is required to gather details about the quota daemon. The following command
writes out the state of the quota daemon across all nodes.
If you need to see the state of a different process, such as the self-heal daemon, you can do so by
running the following command using the process identifier of that process.
IMPORTANT
If you are using either NFS Ganesha or Samba service and you need to see the state of
its clients, ensure that you use localhost instead of hostname. For example:
If you need to get the state of glusterfs fuse mount process, you can do so by running the following
command using the process identifier of that process.
IMPORTANT
If you have a gfapi based application and you need to see the state of its clients, ensure
that the user running the gfapi application is a member of the gluster group. For
example, if your gfapi application is run by user qemu, ensure that qemu is added to the
gluster group by running the following command:
465
Administration Guide
To change where the output files of a particular volume are saved, use the server.statedump-path
parameter, like so:
mem - Displays the memory usage and memory pool details of the bricks.
You can use the --timeout option to ensure that the commands do not get timed out by 120
seconds.
For example,
It is recommended to use --timeout option when obtaining information about the inodes or clients or
details as they frequently get timed out.
466
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
The self-heal daemon status will be displayed only for replicated volumes.
467
Administration Guide
Detailed information is not available for NFS and the self-heal daemon.
Display the list of clients accessing the volumes using the command:
Display the memory usage and memory pool details of the bricks on a volume using the command:
For example, to display the memory usage and memory pool details for the bricks on test-volume:
Mempool Stats
-------------
Name HotCount ColdCount PaddedSizeof
AllocCount MaxAlloc
---- -------- --------- ------------ ----
------ --------
test-volume-server:fd_t 0 16384 92
57 5
test-volume-server:dentry_t 59 965 84
59 59
test-volume-server:inode_t 60 964 148
60 60
test-volume-server:rpcsvc_request_t 0 525 6372
351 2
glusterfs:struct saved_frame 0 4096 124
468
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
2 2
glusterfs:struct rpc_req 0 4096 2236
2 2
glusterfs:rpcsvc_request_t 1 524 6372
2 1
glusterfs:call_stub_t 0 1024 1220
288 1
glusterfs:call_stack_t 0 8192 2084
290 2
glusterfs:call_frame_t 0 16384 172
1728 6
LRU inodes:
GFID Lookups Ref
IA type
---- ------- ---
-------
80f98abe-cdcf-4c1d-b917-ae564cf55763 1 0
1
3a58973d-d549-4ea6-9977-9aa218f233de 1 0
1
2ce0197d-87a9-451b-9094-9baa38121155 1 0
2
Display the open file descriptor tables of the volume using the command:
469
Administration Guide
Connection 2:
RefCount = 0 MaxFDs = 128 FirstFree = 0
No open fds
Connection 3:
RefCount = 0 MaxFDs = 128 FirstFree = 0
No open fds
470
CHAPTER 19. MONITORING RED HAT GLUSTER STORAGE GLUSTER WORKLOAD
Frame 4
Ref Count = 1
Translator = test-volume-locks
Completed = No
Parent = test-volume-io-threads
Wind From = iot_fsync_wrapper
Wind To = FIRST_CHILD (this)->fops->fsync
Frame 5
Ref Count = 1
Translator = test-volume-io-threads
Completed = No
Parent = test-volume-marker
Wind From = default_fsync
Wind To = FIRST_CHILD(this)->fops->fsync
Frame 6
Ref Count = 1
Translator = test-volume-marker
Completed = No
Parent = /export/1
Wind From = io_stats_fsync
Wind To = FIRST_CHILD(this)->fops->fsync
Frame 7
Ref Count = 1
Translator = /export/1
Completed = No
Parent = test-volume-server
Wind From = server_fsync_resume
Wind To = bound_xl->fops->fsync
19.10.1. Troubleshooting a network issue in the Red Hat Gluster Storage Trusted
Storage Pool
When enabling the network components to communicate with Jumbo frames in a Red Hat Gluster
Storage Trusted Storage Pool, ensure that all the network components such as switches, Red Hat
Gluster Storage nodes etc are configured properly. Verify the network configuration by running the
ping from one Red Hat Gluster Storage node to another.
If the nodes in the Red Hat Gluster Storage Trusted Storage Pool or any other network components
are not configured to fully support Jumbo frames, the ping command times out and displays the
following error:
471
Administration Guide
On Red Hat Gluster Storage 3.2 and higher deployments based on Red Hat Enterprise Linux 7, this can
be configured using gdeploy. For more information, see Section 5.1.11, “Limiting Gluster Resources” .
On earlier versions of Red Hat Gluster Storage, it is necessary to manually configure a control group
slice for the glusterd service in order to manage glusterd's access to system resources.
Procedure 20.1. Limiting glusterd resources on RHEL7 based Red Hat Gluster Storage
# mkdir /etc/systemd/system/glusterd.service.d
# echo "[Service]
CPUAccounting=yes
Slice=glusterfs.slice" >> /etc/systemd/system/glusterd.service.d/99-
cpu.conf
# echo "[Slice]
CPUQuota=400%" >> /etc/systemd/system/glusterfs.slice
You can alter the percentage to suit your environment by editing the value in the slice file:
# systemctl daemon-reload
For more information about configuring resource management on Red Hat Enterprise Linux 7, see the
Resource Management Guide: https://access.redhat.com/documentation/en-
US/Red_Hat_Enterprise_Linux/7/html-single/Resource_Management_Guide/index.html#sec-
What_are_Control_Groups
472
CHAPTER 20. MANAGING RESOURCE USAGE
Resource management works differently on Red Hat Enterprise Linux 6. See the Red Hat Enterprise
Linux 6 Resource Management Guide for details: https://access.redhat.com/documentation/en-
US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Using_Control_Groups.html
473
Administration Guide
When configured across 12 disks, RAID 6 can provide ~40% more storage space in comparison to RAID
10, which has a 50% reduction in capacity. However, RAID 6 performance for small file writes and
random writes tends to be lower than RAID 10. If the workload is strictly small files, then RAID 10 is the
optimal configuration.
An important parameter in hardware RAID configuration is the stripe unit size. With thin provisioned
disks, the choice of RAID stripe unit size is closely related to the choice of thin-provisioning chunk size.
For RAID 6, the stripe unit size must be chosen such that the full stripe size (stripe unit * number of
data disks) is between 1 MiB and 2 MiB, preferably in the lower end of the range. Hardware RAID
controllers usually allow stripe unit sizes that are a power of 2. For RAID 6 with 12 disks (10 data disks),
the recommended stripe unit size is 128KiB.
21.1.2. JBOD
In the JBOD configuration, physical disks are not aggregated into RAID devices, but are visible as
separate disks to the operating system. This simplifies system configuration by not requiring a
hardware RAID controller.
If disks on the system are connected through a hardware RAID controller, refer to the RAID controller
documentation on how to create a JBOD configuration; typically, JBOD is realized by exposing raw
drives to the operating system using a pass-through mode.
In the JBOD configuration, a single physical disk serves as storage for a Red Hat Gluster Storage brick.
JBOD configurations support up to 36 disks per node with dispersed volumes and three-way
replication.
1. LVM layer
474
CHAPTER 21. TUNING FOR PERFORMANCE
The steps for creating a brick from a physical device is listed below. An outline of steps for
creating multiple bricks on a physical device is listed as Example - Creating multiple bricks on a
physical device below.
The pvcreate command is used to create the physical volume. The Logical Volume
Manager can use a portion of the physical volume for storing its metadata while the rest is
used as the data portion.Align the I/O at the Logical Volume Manager (LVM) layer using -
-dataalignment option while creating the physical volume.
In case of hardware RAID, the alignment_value should be obtained by multiplying the RAID
stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration,
the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10
configuration, the number of data disks is 6.
For example, the following command is appropriate for 12 disks in a RAID 6 configuration
with a stripe unit size of 128 KiB:
The following command is appropriate for 12 disks in a RAID 10 configuration with a stripe
unit size of 256 KiB:
To view the previously configured physical volume settings for --dataalignment, run
the following command:
For hardware RAID, in order to ensure that logical volumes created in the volume group
are aligned with the underlying RAID geometry, it is important to use the --
physicalextentsize option. Execute the vgcreate command in the following format:
The extent_size should be obtained by multiplying the RAID stripe unit size with the
number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data
disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of
data disks is 6.
475
Administration Guide
For example, run the following command for RAID-6 storage with a stripe unit size of
128 KB, and 12 disks (10 data disks):
In the case of JBOD, use the vgcreate command in the following format:
A thin pool provides a common pool of storage for thin logical volumes (LVs) and their
snapshot volumes, if any.
You can also create a thin pool of the maximum possible size for your device by executing
the following command:
poolmetadatasize
Internally, a thin pool contains a separate metadata device that is used to track the
(dynamically) allocated regions of the thin LVs and snapshots. The
poolmetadatasize option in the above command refers to the size of the pool meta
data device.
The maximum possible size for a metadata LV is 16 GiB. Red Hat Gluster Storage
recommends creating the metadata device of the maximum supported size. You can
allocate less than the maximum if space is a concern, but in this case you should
allocate a minimum of 0.5% of the pool size.
chunksize
An important parameter to be specified while creating a thin pool is the chunk
size,which is the unit of allocation. For good performance, the chunk size for the thin
pool and the parameters of the underlying hardware RAID storage should be chosen so
that they work well together.
For RAID-6 storage, the striping parameters should be chosen so that the full stripe
size (stripe_unit size * number of data disks) is between 1 MiB and 2 MiB, preferably in
the low end of the range. The thin pool chunk size should be chosen to match the RAID
6 full stripe size. Matching the chunk size to the full stripe size aligns thin pool
allocations with RAID 6 stripes, which can lead to better performance. Limiting the
chunk size to below 2 MiB helps reduce performance problems due to excessive copy-
on-write when snapshots are used.
476
CHAPTER 21. TUNING FOR PERFORMANCE
For example, for RAID 6 with 12 disks (10 data disks), stripe unit size should be chosen
as 128 KiB. This leads to a full stripe size of 1280 KiB (1.25 MiB). The thin pool should
then be created with the chunk size of 1280 KiB.
For RAID 10 storage, the preferred stripe unit size is 256 KiB. This can also serve as the
thin pool chunk size. Note that RAID 10 is recommended when the workload has a large
proportion of small file writes or random writes. In this case, a small thin pool chunk
size is more appropriate, as it reduces copy-on-write overhead with snapshots.
block zeroing
By default, the newly provisioned chunks in a thin pool are zeroed to prevent data
leaking between different block devices. In the case of Red Hat Gluster Storage, where
data is accessed via a file system, this option can be turned off for better performance
with the --zero n option. Note that n does not need to be replaced.
You can also use --extents 100%FREE to ensure the thin pool takes up all available
space once the metadata pool is created.
The following example creates a thin pool that takes up all remaining space once the
metadata pool has been created.
After the thin pool has been created as mentioned above, a thinly provisioned logical
volume can be created in the thin pool to serve as storage for a brick of a Red Hat Gluster
Storage volume.
477
Administration Guide
The steps above (LVM Layer) cover the case where a single brick is being created on a
physical device. This example shows how to adapt these steps when multiple bricks need
to be created on a physical device.
NOTE
3. Create a separate thin pool for each brick using the following commands:
In the examples above, the size of each thin pool is chosen to be the same as the size
of the brick that will be created in it. With thin provisioning, there are many possible
ways of managing space, and these options are not discussed in this chapter.
5. Follow the XFS Recommendations (next step) in this chapter for creating and mounting
filesystems for each of the thin logical volumes
478
CHAPTER 21. TUNING FOR PERFORMANCE
2. XFS Recommendataions
As Red Hat Gluster Storage makes extensive use of extended attributes, an XFS inode size
of 512 bytes works better with Red Hat Gluster Storage than the default XFS inode size of
256 bytes. So, inode size for XFS must be set to 512 bytes while formatting the Red Hat
Gluster Storage bricks. To set the inode size, you have to use -i size option with the
mkfs.xfs command as shown in the following Logical Block Size for the Directorysection.
When creating an XFS file system, you can explicitly specify the striping parameters of the
underlying storage in the following format:
# mkfs.xfs other_options -d
su=stripe_unit_size,sw=stripe_width_in_number_of_disks device
For RAID 6, ensure that I/O is aligned at the file system layer by providing the striping
parameters. For RAID 6 storage with 12 disks, if the recommendations above have been
followed, the values must be as following:
For RAID 10 and JBOD, the -d su=<>,sw=<> option can be omitted. By default, XFS will
use the thin-p chunk size and other parameters to make layout decisions.
An XFS file system allows to select a logical block size for the file system directory that is
greater than the logical block size of the file system. Increasing the logical block size for
the directories from the default 4 K, decreases the directory I/O, which in turn improves
the performance of directory operations. To set the block size, you need to use -n size
option with the mkfs.xfs command as shown in the following example output.
Following is the example output of RAID 6 configuration along with inode and block size
options:
479
Administration Guide
Allocation Strategy
inode32 and inode64 are two most common allocation strategies for XFS. With inode32
allocation strategy, XFS places all the inodes in the first 1 TiB of disk. With larger disk, all
the inodes would be stuck in first 1 TiB. inode32 allocation strategy is used by default.
With inode64 mount option inodes would be replaced near to the data which would be
minimize the disk seeks.
To set the allocation strategy to inode64 when file system is being mounted, you need to
use -o inode64 option with the mount command as shown in the following Access
Time section.
Access Time
If the application does not require to update the access time on files, than file system must
always be mounted with noatime mount option. For example:
This optimization improves performance of small-file reads by avoiding updates to the XFS
inodes when files are read.
Allocation groups
Each XFS file system is partitioned into regions called allocation groups. Allocation groups
are similar to the block groups in ext3, but allocation groups are much larger than block
groups and are used for scalability and parallelism rather than disk locality. The default
allocation for an allocation group is 1 TiB.
Allocation group count must be large enough to sustain the concurrent allocation
workload. In most of the cases allocation group count chosen by mkfs.xfs command
would give the optimal performance. Do not change the allocation group count chosen by
mkfs.xfs, while formatting the file system.
If the workload is very small files (average file size is less than 10 KB ), then it is
recommended to set maxpct value to 10, while formatting the file system.
480
CHAPTER 21. TUNING FOR PERFORMANCE
Earlier versions of Red Hat Gluster Storage on Red Hat Enterprise Linux 6 recommended tuned
profiles rhs-high-throughput and rhs-virtualization. These profiles are still available
on Red Hat Enterprise Linux 6. However, switching to the new profiles is recommended.
To apply tunings contained in the tuned profile, run the following command after creating a
Red Hat Gluster Storage volume.
For example:
4. Writeback Caching
For small-file and random write performance, we strongly recommend writeback cache, that is,
non-volatile random-access memory (NVRAM) in your storage controller. For example, normal
Dell and HP storage controllers have it. Ensure that NVRAM is enabled, that is, the battery is
working. Refer your hardware documentation for details on enabling NVRAM.
Do not enable writeback caching in the disk drives, this is a policy where the disk drive
considers the write is complete before the write actually made it to the magnetic media
(platter). As a result, the disk write cache might lose its data during a power failure or even loss
of metadata leading to file system corruption.
As of Red Hat Gluster Storage 3.3, brick multiplexing is supported only for Container-Native Storage
(CNS) and Container-Ready Storage(CRS) use cases.
481
Administration Guide
NOTE
IMPORTANT
Brick compatibility is determined when the volume starts, and depends on volume
options shared between bricks. When brick multiplexing is enabled, Red Hat
recommends restarting the volume whenever any volume configuration details are
changed in order to maintain the compatibility of the bricks grouped under a single
process.
21.3. NETWORK
Data traffic Network becomes a bottleneck as and when number of storage nodes increase. By adding a
10GbE or faster network for data traffic, you can achieve faster per node performance. Jumbo frames
must be enabled at all levels, that is, client, Red Hat Gluster Storage node, and ethernet switch levels.
MTU of size N+208 must be supported by ethernet switch where N=9000. We recommend you to have
a separate network for management and data traffic when protocols like NFS /CIFS are used instead of
native client. Preferred bonding mode for Red Hat Gluster Storage client is mode 6 (balance-alb), this
allows client to transmit writes in parallel on separate NICs much of the time.
21.4. MEMORY
Red Hat Gluster Storage does not consume significant compute resources from the storage nodes
themselves. However, read intensive workloads can benefit greatly from additional RAM.
vm.dirty_ratio
vm.dirty_background_ratio
The appropriate values of these parameters vary with the type of workload:
Large-file sequential I/O workloads benefit from higher values for these parameters.
For small-file and random I/O workloads it is recommended to keep these parameter values
low.
482
CHAPTER 21. TUNING FOR PERFORMANCE
The Red Hat Gluster Storage tuned profiles set the values for these parameters appropriately. Hence, it
is important to select and activate the appropriate Red Hat Gluster Storage profile based on the
workload.
NOTE
You can tune the Red Hat Gluster Storage Server performance by tuning the event thread values.
Example 21.1. Tuning the event threads for a client accessing a volume
You can tune the Red Hat Gluster Storage Server performance using event thread values.
Example 21.2. Tuning the event threads for a server accessing a volume
You can verify the event thread values that are set for the client and server components by executing
the following command:
483
Administration Guide
See topic, Configuring Volume Options for information on the minimum, maximum, and default values for
setting these volume options.
1. As each thread processes a connection at a time, having more threads than connections to
either the brick processes (glusterfsd) or the client processes ( glusterfs or gfapi) is not
recommended. Due to this reason, monitor the connection counts (using the netstat
command) on the clients and on the bricks to arrive at an appropriate number for the event
thread count.
2. Configuring a higher event threads value than the available processing units could again cause
context switches on these threads. As a result reducing the number deduced from the previous
step to a number that is less that the available processing units is recommended.
3. If a Red Hat Gluster Storage volume has a high number of brick processes running on a single
node, then reducing the event threads number deduced in the previous step would help the
competing processes to gain enough concurrency and avoid context switches across the
threads.
4. If a specific thread consumes more number of CPU cycles than needed, increasing the event
thread count would enhance the performance of the Red Hat Gluster Storage Server.
6. Another parameter that could improve the performance when tuning the event-threads value
is to set the performance.io-thread-count (and its related thread-counts) to higher
values, as these threads perform the actual IO operations on the underlying file system.
Negative lookups are expensive and typically slows down file creation, as DHT attempts to find the file
in all sub-volumes. This especially impacts small file performance, where a large number of files are
being added/created in quick succession to the volume.
The negative lookup fan-out behavior can be optimized by not performing the same in a balanced
volume.
484
CHAPTER 21. TUNING FOR PERFORMANCE
NOTE
The configuration takes effect for newly created directories immediately post setting
the above option. For existing directories, a rebalance is required to ensure the volume
is in balance before DHT applies the optimization on older directories.
21.6. REPLICATION
If a system is configured for two ways, active-active replication, write throughput will generally be half
of what it would be in a non-replicated configuration. However, read throughput is generally improved
by replication, as reads can be delivered from either storage node.
Creating files
Deleting files
Renaming files
NOTE
If majority of the workload is modifying the same set of files and directories
simultaneously from multiple clients, then enabling metadata caching might not provide
the desired performance improvement.
1. Execute the following command to enable metadata caching and cache invalidation:
This is group set option which sets multiple volume options in a single command.
2. To increase the number of files that can be cached, execute the following command:
n, is set to 50000. It can be increased if the number of active files in the volume is very high.
Increasing this number increases the memory footprint of the brick processes.
485
Administration Guide
A Cluster Utilization service to monitor overall utilization of volumes in the cluster. This
is created only if there is any volume present in the cluster.
The following service configurations are generated for each volume in the trusted storage
pool:
A Volume Quota - Volume-Name service to monitor the Quota status of the volume, if
Quota is enabled for the volume.
486
CHAPTER 22. NAGIOS CONFIGURATION FILES
The following services are created for each brick in the node:
487
Administration Guide
488
CHAPTER 23. CONFIGURING NETWORK ENCRYPTION IN RED HAT GLUSTER STORAGE
Red Hat Gluster Storage supports network encryption using TLS/SSL. Red Hat Gluster Storage uses
TLS/SSL for authentication and authorization, in place of the home grown authentication framework
used for normal connections. Red Hat Gluster Storage supports the following encryption types:
I/O encryption - encryption of the I/O connections between the Red Hat Gluster Storage
clients and servers
/etc/ssl/glusterfs.key - This file contains the system's unique private key. This file must
not be shared with others.
The CA file on the clients must contain the certificates of the signing CA for all the servers. In
case self-signed certificates are being used, the CA file for the servers is a concatenation of
the certificate files /etc/ssl/glusterfs.pem of every server and every client. The client
CA file is a concatenation of the certificate files of every server.
23.1. PREREQUISITES
Before setting up the network encryption, you must first generate a private key and a signed
certificate for each system and place it in the respective folders. You must generate a private key and a
signed certificate for both clients and servers.
Perform the following to generate a private key and a signed certificate for both clients and servers:
489
Administration Guide
2. Use the generated private key to create a signed certificate by running the following
command:
If your organization has a common CA, the certificate can be signed by it. To do this a
certificate signing request (CSR) must be generated by running the following command:
The common name in this command can be a hostname / FQDN / IP address, et cetera. The
generated glusterfs.csr file should be given to the CA, and CA will provide a .pem file
containing the signed certificate. Place that signed glusterfs.pem file in the/etc/ssl/
directory.
By default, the SSL certificate expires after 30 days. You can use the -days option to specify
the validity of the cerfitifacte based on your requirement. In the above command, the
certificate is valid for 365 days (1 year).
3. 1. For self signed CA certificates on servers, collect the .pem certificates of clients and
servers, that is, /etc/ssl/glusterfs.pem files from every system. Concatenate the
collected files into a single file. Place this file in /etc/ssl/glusterfs.ca on all the
servers in the trusted storage pool. If you are using common CA, collect the certificate file
from the CA and place it in /etc/ssl/glusterfs.ca on all servers.
2. For self-signed CA certificates on clients, collect the .pem certificates of servers, that is,
/etc/ssl/glusterfs.pem files from every server. Concatenate the collected files into a
single file. Place this file in /etc/ssl/glusterfs.ca on all the clients. If you are using
common CA, collect the certificate file from the CA and place it in
/etc/ssl/glusterfs.ca on all servers.
On Servers
Perform the following on all the servers
# touch /var/lib/glusterd/secure-access
490
CHAPTER 23. CONFIGURING NETWORK ENCRYPTION IN RED HAT GLUSTER STORAGE
3. Setup the trusted storage pool by running appropriate peer probe commands. For more
information on setting up the trusted storage pool, see Chapter 4, Adding Servers to the
Trusted Storage Pool
On Clients
Perform the following on all the client machines
# touch /var/lib/glusterd/secure-access
2. Mount the volume on all the clients. For example, to manually mount a volume and access data
using Native client, use the following command:
2. Set the list of common names of all the servers to access the volume. Be sure to include the
common names of clients which will be allowed to access the volume..
NOTE
3. Enable Transport Layer Security on the volume by setting the client.ssl and server.ssl
options to on.
5. Mount the volume on all the clients which has been authorized. For example, to manually
mount a volume and access data using Native client, use the following command:
491
Administration Guide
# umount mount-point
3. Set the list of common names for clients allowed to access the volume. Be sure to include the
common names of all the servers .
NOTE
4. Enable Transport Layer Security on the volume by setting the client.ssl and server.ssl
options to on.
6. Mount the volume from the new clients. For example, to manually mount a volume and access
data using Native client, use the following command:
492
CHAPTER 23. CONFIGURING NETWORK ENCRYPTION IN RED HAT GLUSTER STORAGE
Though, Red Hat Gluster Storage can be configured only for I/O encryption without using management
encryption, management encryption is recommended. On an existing installation, with running servers
and clients, schedule a downtime of volumes, applications, clients, and other end-users to enable
management encryption.
You cannot currently change between unencrypted and encrypted connections dynamically. Bricks
and other local services on the servers and clients do not receive notifications from glusterd if they
are running when the switch to management encryption is made.
# umount mount-point
2. If you are using either NFS Ganesha or Samba service, then stop the service. For more
information regarding NFS Ganesha see, Section 6.2.3, “NFS Ganesha” . For more information
regarding Samba, see Section 6.3, “SMB”.
3. If shared storage is being used, then unmount the shared storage on all nodes
# umount /var/run/gluster/shared_storage
NOTE
# pkill glusterfs
# touch /var/lib/glusterd/secure-access
493
Administration Guide
11. If you are using either NFS Ganesha or Samba service, then start the service. For more
information regarding NFS Ganesha see, Section 6.2.3, “NFS Ganesha” . For more information
regarding Samba, see Section 6.3, “SMB”.
12. Mount the volume on all the clients. For example, to manually mount a volume and access data
using Native client, use the following command:
1. Copy /etc/ssl/glusterfs.ca file from one of the existing servers and save it on
the/etc/ssl/ directory on the new server.
# touch /var/lib/glusterd/secure-access
4. Add the common name of the new server to the auth.ssl-allow list for all volumes which
have encryption enabled.
NOTE
The gluster volume set command does not append to existing values of the
options. To append the new name to the list, get the existing list using gluster
volume info command, append the new name to the list and set the option
again using gluster volume set command.
5. Run gluster peer probe [server] to add additional servers to the trusted storage pool. For
more information on adding servers to the trusted storage pool, see Chapter 4, Adding Servers
to the Trusted Storage Pool .
494
CHAPTER 23. CONFIGURING NETWORK ENCRYPTION IN RED HAT GLUSTER STORAGE
Using self-signed certificates would require a downtime of servers to add a new server into the trusted
storage pool, as the CA list cannot be dynamically reloaded. To add a new server:
1. Generate the private key and self-signed certificate on the new server using the steps listed at
Section 23.1, “Prerequisites”.
1. On an existing server, copy the /etc/ssl/glusterfs.ca file, append the content of new
server's certificate to it, and distribute it to all servers, including the new server.
# pkill glusterfs
6. Add the common name of the new server to the auth.ssl-allow list for all volumes which
have encryption enabled.
NOTE
7. Restart all the glusterfs processes on existing servers and clients by performing the following .
# umount mount-point
495
Administration Guide
5. Mount the volume on all the clients. For example, to manually mount a volume and access
data using Native client, use the following command:
8. Peer probe the new server to add it to the trusted storage pool. For more information on peer
probe, see Chapter 4, Adding Servers to the Trusted Storage Pool
1. Generate the glusterfs.key private key and glusterfs.csr certificate signing request.
Send the glusterfs.csr to get it verified by CA and get the glusterfs.pem from the CA.
Generate the private key and signed certificate for the new server and place the files in the
appropriate locations using the steps listed at Section 23.1, “Prerequisites” .
2. Copy /etc/ssl/glusterfs.ca file from another client and place it in the /etc/ssl/
directory on the new client..
# touch /var/lib/glusterd/secure-access
4. Set the list of common names of all the servers to access the volume. Be sure to include the
common names of clients which will be allowed to access the volume.
NOTE
The gluster volume set command does not append to existing values of the
options. To append the new name to the list, get the existing list using gluster
volume info command, append the new name to the list and set the option
again using gluster volume set command.
5. Mount the volume from the new client. For example, to manually mount a volume and access
data using Native client, use the following command:
496
CHAPTER 23. CONFIGURING NETWORK ENCRYPTION IN RED HAT GLUSTER STORAGE
NOTE
To authorize a new client to access the Red Hat Gluster Storage trusted storage pool using self-signed
certificate, perform the following.
1. Generate the glusterfs.key private key and glusterfs.pem certificate for the client, and
place them at the appropriate locations on the client using the steps listed at Section 23.1,
“Prerequisites” .
2. Copy /etc/ssl/glusterfs.ca file from one of the clients, and add it to the new client.
# touch /var/lib/glusterd/secure-access
4. Copy /etc/ssl/glusterfs.ca file from one of the existing servers, append the content of
new client's certificate to it, and distribute the new CA file on all servers.
5. Set the list of common names for clients allowed to access the volume. Be sure to include the
common names of all the servers.
NOTE
The gluster volume set command does not append to existing values of the
options. To append the new name to the list, get the existing list using gluster
volume info command, append the new name to the list and set the option
again using gluster volume set command.
8. Mount the volume from the new client. For example, to manually mount a volume and access
data using Native client, use the following command:
497
Administration Guide
498
CHAPTER 24. RESOLVING COMMON ISSUES
For more information on performing statedump, see Section 19.8, “Viewing complete volume state
with statedump”
1. Perform statedump on the volume to view the files that are locked using the following
command:
The statedump files are created on the brick servers in the /tmp directory or in the
directory set using the server.statedump-path volume option. The naming convention of
the dump file is brick-path.brick-pid.dump.
The following are the sample contents of the statedump file indicating entry lock (entrylk).
Ensure that those are stale locks and no resources own them.
[xlator.features.locks.vol-locks.inode]
path=/
mandatory=0
entrylk-count=1
lock-dump.domain.domain=vol-replicate-0
xlator.feature.locks.lock-dump.domain.entrylk.entrylk[0]
(ACTIVE)=type=ENTRYLK_WRLCK on basename=file1, pid = 714782904,
owner=ffffff2a3c7f0000, transport=0x20e0670, , granted at Mon Feb 27
16:01:01 2012
conn.2.bound_xl./rhgs/brick1.hashsize=14057
conn.2.bound_xl./rhgs/brick1.name=/gfs/brick1/inode
conn.2.bound_xl./rhgs/brick1.lru_limit=16384
499
Administration Guide
conn.2.bound_xl./rhgs/brick1.active_size=2
conn.2.bound_xl./rhgs/brick1.lru_size=0
conn.2.bound_xl./rhgs/brick1.purge_size=0
The following are the sample contents of the statedump file indicating there is an inode lock
(inodelk). Ensure that those are stale locks and no resources own them.
[conn.2.bound_xl./rhgs/brick1.active.1]
gfid=538a3d4a-01b0-4d03-9dc9-843cd8704d07
nlookup=1
ref=2
ia_type=1
[xlator.features.locks.vol-locks.inode]
path=/file1
mandatory=0
inodelk-count=1
lock-dump.domain.domain=vol-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
= 714787072, owner=00ffff2a3c7f0000, transport=0x20e0670, , granted
at Mon Feb 27 16:01:01 2012
The following are the sample contents of the statedump file indicating there is a granted
POSIX lock. Ensure that those are stale locks and no resources own them.
xlator.features.locks.vol1-locks.inode]
path=/file1
mandatory=0
posixlk-count=15
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=8, len=1, pid
= 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at
Mon Feb 27 16:01:01 2012
, granted at Mon Feb 27 16:01:01 2012
500
CHAPTER 24. RESOLVING COMMON ISSUES
The following are the sample contents of the statedump file indicating there is a blocked
POSIX lock. Ensure that those are stale locks and no resources own them.
[xlator.features.locks.vol1-locks.inode]
path=/file1
mandatory=0
posixlk-count=30
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=1, pid
= 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at
Mon Feb 27 16:01:01 2012
, granted at Mon Feb 27 16:01:01
...
501
Administration Guide
The following are the sample contents of the statedump file indicating that there are POSIX
locks. Ensure that those are stale locks and no resources own them.
[xlator.features.locks.vol1-locks.inode]
path=/file1
mandatory=0
posixlk-count=11
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=8, len=1, pid
= 12776, owner=a36bb0aea0258969, transport=0x120a4e0, , blocked at
Mon Feb 27 16:01:01 2012
, granted at Mon Feb 27 16:01:01 2012
You can perform statedump on test-volume again to verify that all the above locks are cleared.
502
CHAPTER 24. RESOLVING COMMON ISSUES
The heal info command lists the GFIDs of the files that needs to be healed. If you want to find the path
of the files associated with the GFIDs, use the getfattr utility. The getfattr utility enables you to
locate a file residing on a gluster volume brick. You can retrieve the path of a file even if the filename is
unknown.
Where,
filename: The name of the file for which the path information is to be retrieved.
For example:
The command output displays the brick pathinfo under the <POSIX> tag. In this example output, two
paths are displayed as the file is replicated twice and resides on a two-way replicated volume.
NOTE
503
Administration Guide
Where,
For example:
Where,
For example:
The command output displays the brick pathinfo under the <POSIX> tag. In this example
output, two paths are displayed as the file is replicated twice and resides on a two-way
replicated volume.
504
PART VIII. APPENDICES
505
Administration Guide
Regardless of changes made to the underlying hardware, the trusted storage pool is always available
while changes to the underlying hardware are made. As storage is added to the trusted storage pool,
volumes are rebalanced across the pool to accommodate the added storage capacity.
The glusterd service is started automatically on all servers in the trusted storage pool. The service
can also be manually started and stopped as required.
When a Red Hat Gluster Storage server node that hosts a very large number of bricks or snapshots is
upgraded, cluster management commands may become unresponsive as glusterd attempts to start all
brick processes concurrently for all bricks and snapshots. If you have more than 250 bricks or
snapshots being hosted by a single node, Red Hat recommends deactivating snapshots until upgrade is
complete.
506
CHAPTER 26. MANUALLY RECOVERING FILE SPLIT-BRAIN
1. Run the following command to obtain the path of the file that is in split-brain:
From the command output, identify the files for which file operations performed from the client
keep failing with Input/Output error.
2. Close the applications that opened split-brain file from the mount point. If you are using a
virtual machine, you must power off the machine.
3. Obtain and verify the AFR changelog extended attributes of the file using the getfattr
command. Then identify the type of split-brain to determine which of the bricks contains the
'good copy' of the file.
For example,
For example,
507
Administration Guide
brick6: server1:/rhgs/brick6
brick7: server1:/rhgs/brick7
brick8: server1:/rhgs/brick8
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Each file in a brick maintains the changelog of itself and that of the files present in all the other
bricks in it's replica set as seen by that brick.
In the example volume given above, all files in brick-a will have 2 entries, one for itself and the
other for the file present in it's replica pair. The following is the changelog for brick2,
NOTE
These files do not have entries for themselves, only for the other bricks in the
replica. For example, brick1 will only have trusted.afr.vol-client-1 set
and brick2 will only have trusted.afr.vol-client-0 set. Interpreting the
changelog remains same as explained below.
508
CHAPTER 26. MANUALLY RECOVERING FILE SPLIT-BRAIN
Each extended attribute has a value which is 24 hexa decimal digits. First 8 digits represent
changelog of data. Second 8 digits represent changelog of metadata. Last 8 digits represent
Changelog of directory entries.
For directories, metadata and entry changelogs are valid. For regular files, data and metadata
changelogs are valid. For special files like device files and so on, metadata changelog is valid.
When a file split-brain happens it could be either be data split-brain or meta-data split-brain or
both.
The following is an example of both data, metadata split-brain on the same file:
509
Administration Guide
Here, both the copies have data, metadata changes that are not on the other file. Hence, it is
both data and metadata split-brain.
You must change the changelog extended attributes on the files as if some metadata
operations succeeded on /rhgs/brick2/a but failed on /rhgs/brick1/a. But
/rhgs/brick1/a should not have any changelog which says some metadata operations
succeeded on /rhgs/brick1/a but failed on /rhgs/brick2/a. You must reset metadata
part of the changelog on trusted.afr.vol-client-1 of /rhgs/brick1/a
# setfattr -n trusted.afr.vol-client-0 -v
0x000000000000000100000000 /rhgs/brick2/a
# setfattr -n trusted.afr.vol-client-1 -v
0x000003d70000000000000000 /rhgs/brick1/a
510
CHAPTER 26. MANUALLY RECOVERING FILE SPLIT-BRAIN
After you reset the extended attributes, the changelogs would look similar to the following:
\#file: rhgs/brick2/a
trusted.afr.vol-client-0=0x000000000000000100000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
For example:
On brick-a the directory has 2 entries file1 with gfid_x and file2 . On brick-b
directory has 2 entries file1 with gfid_y and file3. Here the gfid's of file1 on the bricks
are different. These kinds of directory split-brain needs human intervention to resolve the
issue. You must remove either file1 on brick-a or the file1 on brick-b to resolve the
split-brain.
In addition, the corresponding gfid-link file must be removed. The gfid-link files are
present in the .glusterfs directory in the top-level directory of the brick. If the gfid of the
file is 0x307a5c9efddd4e7c96e94fd4bcdcbd1b (the trusted.gfid extended attribute
received from the getfattr command earlier), the gfid-link file can be found at
/rhgs/brick1/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b.
WARNING
Before deleting the gfid-link, you must ensure that there are no hard
links to the file present on that brick. If hard-links exist, you must delete
them.
# ls -l <file-path-on-gluster-mount>
511
Administration Guide
or
512
APPENDIX A. REVISION HISTORY
513