Ess SDG
Ess SDG
6.1.3.1
IBM
SC27-9859-02
Note
Before using this information and the product it supports, read the information in “Notices” on page
147.
This edition applies to Version 6 release 1 modification 3 of the following product and to all subsequent releases and
modifications until otherwise indicated in new editions:
• IBM Spectrum® Scale Data Management Edition for IBM® ESS (product number 5765-DME)
• IBM Spectrum Scale Data Access Edition for IBM ESS (product number 5765-DAE)
IBM welcomes your comments; see the topic “How to submit your comments” on page xv. When you send information
to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without
incurring any obligation to you.
© Copyright International Business Machines Corporation 2020, 2022.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
Contents
Figures................................................................................................................ vii
Tables.................................................................................................................. ix
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500,
and ESS Legacy................................................................................................51
Disk call home for ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy.................................... 51
Installing the IBM Electronic Service Agent............................................................................................. 53
Configuring call home on ESS systems..................................................................................................... 53
Configuring proxy for call home.................................................................................................................56
ESS call home logs and location................................................................................................................57
Overview of a problem report....................................................................................................................60
Problem details section of ESA............................................................................................................ 61
Call home monitoring of ESS 5000, ESS 3000, ESS 3200, and ESS Legacy systems and their
disk enclosures............................................................................................................................... 64
Upload data.......................................................................................................................................... 65
Uninstalling, reinstalling, and troubleshooting the IBM Electronic Service Agent.................................. 66
Test call home............................................................................................................................................ 66
Post setup activities...................................................................................................................................68
essinstallcheck enhancement of software and hardware call home ...................................................... 68
iii
Call home pre-installation worksheets..................................................................................................... 68
Appendix I. ESS protocol node deployment by using the IBM Spectrum Scale
installation toolkit........................................................................................... 85
Notices..............................................................................................................147
Trademarks.............................................................................................................................................. 148
Terms and conditions for product documentation................................................................................. 148
iv
Glossary............................................................................................................ 151
Index................................................................................................................ 159
v
vi
Figures
11. ESA portal showing enclosures with drive replacement events .............................................................60
vii
24. Switch port and switch markings........................................................................................................... 110
viii
Tables
1. Conventions..................................................................................................................................................xv
ix
x
About this information
IBM Elastic Storage System (ESS) 3200 documentation consists of the following information units.
IBM Elastic Storage System (ESS) 3000 documentation consists of the following information units.
IBM Elastic Storage System (ESS) 5000 documentation consists of the following information units.
Related information
Related information
For information about:
• IBM Spectrum Scale, see IBM Documentation.
• mmvdisk command, see mmvdisk documentation.
• Mellanox OFED (MLNX_OFED_LINUX-4.9-4.1.7.2) Release Notes, go to https://docs.nvidia.com/
networking/spaces/viewspace.action?key=MLNXOFEDv494080.
• Mellanox OFED (MLNX_OFED_LINUX-5.4-3.0.3.0) Release Notes, go to https://docs.mellanox.com/
display/MLNXOFEDv543030/Release+Notes. (Actual version shipped is 5.5-x.)
• IBM Elastic Storage System, see IBM Documentation.
• IBM Spectrum Scale call home, see Understanding call home.
• Installing IBM Spectrum Scale and CES protocols with the installation toolkit, see Installing IBM
Spectrum Scale on Linux® nodes with the installation toolkit.
• Detailed information about the IBM Spectrum Scale installation toolkit, see Using the installation toolkit
to perform installation tasks: Explanations and examples.
• CES HDFS, see Adding CES HDFS nodes into the centralized file system.
• Installation toolkit ESS support, see ESS awareness with the installation toolkit.
• IBM POWER8® servers, see https://www.ibm.com/docs/en/power-sys-solutions/0008-ESS?
topic=P8ESS/p8hdx/5148_22l_landing.htm
• IBM POWER9™ servers, see https://www.ibm.com/docs/en/ess/6.1.0_ent?topic=guide-5105-22e-
reference-information.
For the latest support information about IBM Spectrum Scale RAID, see the IBM Spectrum Scale RAID
FAQ in IBM Documentation.
Table 1. Conventions
Convention Usage
bold Bold words or characters represent system elements that you must use literally,
such as commands, flags, values, and selected menu options.
Depending on the context, bold typeface sometimes represents path names,
directories, or file names.
bold bold underlined keywords are defaults. These take effect if you do not specify a
underlined different keyword.
constant width Examples and information that the system displays appear in constant-width
typeface.
Depending on the context, constant-width typeface sometimes represents path
names, directories, or file names.
italic Italic words or characters represent variable values that you must supply.
Italics are also used for information unit titles, for the first use of a glossary term,
and for general emphasis in text.
<key> Angle brackets (less-than and greater-than) enclose the name of a key on the
keyboard. For example, <Enter> refers to the key on your terminal or workstation
that is labeled with the word Enter.
\ In command examples, a backslash indicates that the command or coding example
continues on the next line. For example:
{item} Braces enclose a list from which you must choose an item in format and syntax
descriptions.
[item] Brackets enclose optional items in format and syntax descriptions.
<Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example,
<Ctrl-c> means that you hold down the control key while pressing <c>.
item... Ellipses indicate that you can repeat the preceding item one or more times.
| In synopsis statements, vertical lines separate a list of choices. In other words, a
vertical line means Or.
In the left margin of the document, vertical lines indicate technical changes to the
information.
ESS 3500 ESS 3200 ESS 3000 ESS 5000 ESS Legacy
Runs on POWER9 EMS POWER9 EMS POWER8 or POWER9 EMS POWER8 or
POWER9 EMS POWER9 EMS
I/O node OS Red Hat® Red Hat Red Hat Red Hat Red Hat
Enterprise Linux Enterprise Linux Enterprise Linux Enterprise Linux Enterprise Linux
8.4 x86_64 8.4 x86_64 8.4 x86_64 8.4 PPC64LE 7.9 PPC64LE
Container Red Hat UBI 8.4 Red Hat UBI 8.4 Red Hat UBI 8.4 Red Hat UBI 8.4 Red Hat UBI 8.4
version
Ansible® 2.9.27-1 2.9.27-1 2.9.27-1 2.9.27-1 2.9.27-1
xCAT 2.16.3 (For 2.16.3 2.16.3 2.16.3 (for SCT 2.16.3 (for SCT
internal use only only) only)
Not used in
- not on IBM Fix
customer
Central.)
shipped image -
only for SCT
NVDIMM ver:
Bundled
BPM ver:
Bundled
Support matrix
Release OS Runs on Can upgrade or deploy
ESS 3500 6.1.4 Red Hat Enterprise Linux POWER9 EMS • ESS 3500 nodes
8.4 (x86_64)
• POWER9 EMS
• POWER9 protocol
nodes
ESS 3200 6.1.4 Red Hat Enterprise Linux • POWER9 EMS • ESS 3200 nodes
8.4 (x86_64)
• POWER9 EMS
• POWER9 protocol
nodes
ESS 3000 6.1.4 • Red Hat Enterprise • POWER8 EMS • ESS 3000 nodes
Linux 7.9 (PPC64LE) • POWER9 EMS • POWER8 EMS
• Red Hat Enterprise • POWER9 EMS
Linux 8.4 (x86_64)
• POWER8 protocol
nodes
• POWER9 protocol
nodes
ESS 5000 6.1.4 Red Hat Enterprise Linux • POWER9 EMS • ESS 5000 nodes
8.4 (PPC64LE)
• POWER9 EMS
• POWER9 protocol
nodes
Prerequisites
• This document (ESS Software Quick Deployment Guide)
• SSR completes physical hardware installation and code 20.
– SSR uses Worldwide Customized Installation Instructions (WCII) for racking, cabling, and disk
placement information.
– SSR uses the respective ESS Hardware Guide (ESS 3000 or ESS 5000 or ESS 3200 or ESS 3500) for
hardware checkout and setting IP addresses.
• Worksheet notes from the SSR
• Latest ESS xz downloaded to the EMS node from Fix Central (If a newer version is available).
– Data Access Edition or Data Management Edition: Must match the order. If the edition does not match
your order, open a ticket with the IBM Service.
• High-speed switch and cables have been run and configured.
• Low-speed host names are ready to be defined based on the IP addresses that the SSR have configured.
• High-speed host names (suffix of low speed) and IP addresses are ready to be defined.
• Container host name and IP address are ready to be defined in the /etc/hosts file.
• Host and domain name (FQDN) are defined in the /etc/hosts file.
• ESS Legacy 6.1.x.x Only: You must convert to mmvdisk before deploying the ESS Legacy 6.1.x.x
container if you are coming from a non-container version such as ESS 5.3.x.x. If you have not done so
already, convert to mmvdisk by using the following steps:
1. Check whether there are any mmvdisk node classes.
There should be one node class per ESS Legacy building-block. If the command output does not
show mmvdisk for your ESS Legacy nodes, convert to mmvdisk before running the ESS Legacy
6.1.0.x container.
2. Convert to mmvdisk by running the following command from one of the POWER8 I/O nodes or from
the POWER8 EMS node.
You can check whether an ESS or IBM Spectrum Scale RPM is signed by IBM as follows.
1. Import the PGP key.
rpm -K RPMFile
ESS 3000, ESS 5000, 3500, and ESS Legacy networking requirements
In any scenario you must have an EMS node and a management switch. The management switch must be
split into two VLANs.
• Management VLAN
• Service/FSP VLAN
Note: To future proof your environment for ESS 3200, modify any existing management switches
to the new VLAN configuration. For more information, see Appendix O, “Switch VLAN configuration
instructions,” on page 109.
You also need a high-speed switch (IB or Ethernet) for cluster communication.
ESS 3000
POWER8 or POWER9 EMS
It is recommended to buy a POWER9 EMS with ESS 3000. If you have a legacy environment (POWER8),
it is recommended to migrate to IBM Spectrum Scale 5.1.x.x and use the POWER9 EMS as the single
management server.
• If you are adding ESS 3000 to a POWER8 EMS:
– An additional connection for the container to the management VLAN must be added. A C10-T2 cable
must be run to this VLAN.
– A public/campus connection is required in C10-T3.
– A management connection must be run from C10-T1 (This should be already in place if adding to an
existing POWER8 EMS with legacy nodes).
– Port 1 on each ESS 3000 canister must be connected to the management VLAN.
• If you are using an ESS 3000 with a POWER9 EMS:
– C11-T1 must be connected on the EMS to the management VLAN.
– Port 1 on each ESS 3000 canister must be connected to the management VLAN.
– C11-T2 must be connected on the EMS to the FSP VLAN.
– HMC1 must be connected on the EMS to the FSP VLAN.
Management/Provisioning FSPService
Network (192.168.x.x) Network
(10.X.X.X/24)
Management/Provisioning
Network (192.168.x.x)
Ethernet 1
192.168.X.X/24
192.168.X.X/24 ESS 3200 BMC
10.x.x.x/24
10.x.x.x/24
P
Share same SSR Serial Port
physical port
C11-T1(192.168.X.X/24)
P9
ESS 5K I/O Server - essio1 HMC 1
C11-T4(10.111.222.101/30) FSP
SSR
Port
C11-T1(192.168.X.X/24)
P9
ESS 5K I/O Server - essio2 HMC 1
C11-T4(10.111.222.101/30) FSP
SSR
Port
The ports highlighted in green are the ESS 3200 trunk ports. These are special ports that are for the ESS
3200 only. The reason for these ports is that each ESS 3200 canister has a single interface for both the
BMC and the OS but unique MAC addresses. By using a VLAN tag, canister BMC MAC addresses are routed
to the BMC/FSP/Service VLAN (Default is 101).
IBM racked orders have the switch preconfigured. Only the VLAN tag needs to be set. If you have an
existing IBM Cumulus switch or customer supplied switch, it needs to be modified to accommodate the
ESS 3200 trunk port requirement. For more information, see Appendix O, “Switch VLAN configuration
instructions,” on page 109.
Note: It is mandatory that you connect C11-T3 to a campus connection or run an additional management
connection. If you do not do this step, you will lose the connection to the EMS node when the container
starts.
ESS 3500 network requirements
ESS Legacy
POWER8 or POWER9 EMS supported
POWER8 EMS must have the following connections:
• C10-T1 to the management VLAN
• C10-T4 to the FSP/Service VLAN
• C10-T2 to the management VLAN
• C10-T3 optional campus connection
• HMC1 to the FSP/Service VLAN
POWER9 EMS must have the following connections:
• C11-T1 to the management VLAN
• C11-T2 to the FSP VLAN
• HMC1 to the FSP VLAN
• C11-T3 to the campus or management network/VLAN
POWER8 nodes:
• C12-T1 to the management VLAN
• HMC1 to the FSP VLAN
Code version
ESS 3000, ESS 3200, ESS 5000, and ESS 3500 releases are included in ESS 6.1.4.x with two editions:
Data Management Edition and Data Access Edition. An example of package names is as follows:
ess_6.1.3.1_0623-14_dme_ppc64le.tar.xz
ess_6.1.3.1_0623-14_dae_ppc64le.tar.xz
Note:
• The versions shown here might not be the GA version available on IBM FixCentral. It is recommended to
go to IBM FixCentral and download the latest code.
Note: The container installs and runs on the EMS only. The EMS supported is Power-based only. Running
container on a x86-based node is not supported as of now.
POWER8 considerations
If you are moving from an xCAT-based release (5.3.x) to a container-based release (6.1.x.x), the following
considerations apply:
• You must add an additional management network connection to C10-T2.
• A public or additional management connection is mandatory in C10-T3.
• You must stop and uninstall xCAT and all xCAT dependencies before installing the container.
b. Uninstall xCAT.
c. Remove dependencies.
Other notes
• The following tasks must be complete before starting a new installation (tasks done by manufacturing
and the SSR):
– SSR has ensured all hardware is clean, and IP addresses are set and pinging over the proper
networks (through the code 20 operation).
– /etc/hosts is blank.
– The ESS tgz file (for the correct edition) is in the /home/deploy directory. If upgrade is needed,
download from Fix Central and replace.
– Network bridges are cleared.
– Images and containers are removed.
– SSH keys are cleaned up and regenerated.
– All code levels are at the latest at time of manufacturing ship.
• Customer must make sure that the high-speed connections are cabled and the switch is ready before
starting.
• All node names and IP addresses in this document are examples.
• Changed root password should be same on each node, if possible. The default password is
ibmesscluster. It is recommended to change the password after deployment is completed.
If hostid on any node is not unique, you must fix by running genhostid. These steps must be done
when creating a recovery group in a stretch cluster.
• Consider placing your protocol nodes in file system maintenance mode before upgrades. This is
not a requirement but you should strongly consider doing it. For more information, see File system
maintenance mode.
• Do not try to update the EMS node while you are logged in over the high-speed network. Update the
EMS node only through the management or the campus connection.
• After adding an I/O node to the cluster, run the gnrhealthcheck command to ensure that there are
no issues before creating vdisk sets. For example, duplicate host IDs. Duplicate host IDs cause issues in
the ESS environment.
• Run the container from a direct SSH connection. Do not SSH from an I/O node or any node that might be
rebooted by the container.
ess_6.1.3.1_0623-14_dae_ppc64le.tar.xz
Note: If you have protocol nodes, add them to the commands provided in these instructions. The
default /etc/hosts file has host names prt1 and prt2 for protocol nodes. You might have more than
two protocol nodes.
1. Log in to the EMS node by using the management IP (set up by SSR by using the provided worksheet).
The default password is ibmesscluster.
2. Set up a campus or a public connection (interface enP1p8s0f2) (The connection might be
named 'campus'). Connect an Ethernet cable to C11-T3 on the EMS node to your lab network.
This connection serves as a way to access the GUI or the ESA agent (call home) from outside of the
management network. The container creates a bridge to the management network, thus having a
campus connection is highly advised.
Note: It is recommended but not mandatory to set up a campus or public connection. If you do not
set up a campus or a public connection, you will temporarily lose your connection when the container
bridge is created in a later step.
This method is for configuring the campus network, not any other network in the EMS node. Do not
modify T1, T2, or T4 connections in the system after they are set by SSR, and use the SSR method
only to configure T1 and T2 (if changing is mandatory after SSR is finished). That includes renaming
the interface, setting IP, or any other interaction with those interfaces.
You can use the nmtui command to set the IP address of the campus interface. For more
information, see Configuring IP networking with nmtui tool.
3. Complete the /etc/hosts file on the EMS node. This file must contain the low-speed (management)
and high-speed (cluster) IP addresses, FQDNs, and short names. The high-speed names must
contain a suffix to the low-speed names (For example, essio1-hs (high-speed name) to essio1 (low-
speed name)). This file must also contain the container host name and the IP address.
Note:
• localdomain.local is just an example and cannot be used for deployment. You must change it
to a valid fully qualified domain name (FQDN) during the /etc/hosts setup. The domain must be
the same for each network subnet that is defined. Also, ensure that you set the domain on the EMS
node (hostnamectl set-hostname NAME).
NAME must be the FQDN of the management interface (T1) of the EMS node. If you need to set
other names for campus, or other interfaces, those names must be the alias but not the main host
name as returned by the hostnamectl command.
You can set up the EMS FQDN manually or wait until prompted when the ESS deployment binary
is started. At that time, the scripts confirms the FQDN and provides the user a chance to make
changes.
• If you are planning to set up a supported ESS system with the p9 EMS node, add new ESS host
names to /etc/hosts by using the same structure. For example, low-speed (management) and
high-speed (cluster) IP addresses, FQDNs, and short names.
• Do not use any special characters, underscores, or dashes in the host names other than the high
speed suffix (example: -hs). Doing this might cause issues with the deployment procedure.
4. Clean up the old containers and images.
Bridges are cleaned up automatically. However, if you want to clean up bridges manually, complete
the following steps. An option is available to prevent cleanup if desired.
Note: Typically, this is applicable only for upgrades.
If podman is not installed, install it. For more information about the podman installation, see this
step.
./ess_6.1.3.1_0623-14_dme_ppc64le --start-container
Note: If a node was reinstalled, podman will not be available. Thus, you can skip the cleanup.
a. List the containers.
podman ps -a
podman images
nmcli c
ii) Clean up any existing bridges before the new container is set up. The bridge names must be
mgmt_bridge and fsp_bridge.
nmcli c up fsp
nmcli c up mgmt
POWER9
• If you are using a POWER8 EMS and converting from the xCAT-based deployment to container, you
must first stop and uninstall xCAT as follows.
6. Stop the GUI temporarily until upgrade or conversion from xCAT deployment to container is
complete.
When you are updating the EMS, shut down the GUI. Do not start the GUI until you finish upgrading
the EMS. Because the GUI is shut down, any containers that access the ESS storage through the REST
API cannot access the storage temporarily.
7. Extract the installation package.
Note: Ensure that you check the version that is installed from manufacturing (SSR worksheet). If
there is a newer version available on Fix Central, replace the existing image in /home/deploy with
the new image and then remove the old tgz file before doing this step.
cd /home/deploy
xz --decompress ess_6.1.3.1_0623-14_dme_ppc64le.tar.xz
tar xvf ess_6.1.3.1_0623-14_dme_ppc64le.tar
ess_6.1.3.1_0623-14_dme_ppc64le
ess_6.1.3.1_0623-14_dme_ppc64le.sha256
During this step, you are first prompted to accept the license agreement. Press 1 to accept. You are
then prompted to input answers to 3 questions before the installation starts (2 questions for ESS
3000).
• Confirm or set EMS FQDN.
• Provide the container short name.
• Provide a free IP address on the FSP subnet for the container FSP connection.
Example of contents of the extracted installation package:
├── 70-persistent-net-ems.rules
├── classes
│ ├── essmgr.py
│ ├── essmgr_yml.py
│ ├── _init_.py
│ └── _pycache_
│ ├── essmgr.cpython-36.pyc
│ ├── essmgr_yml.cpython-36.pyc
│ └── _init_.cpython-36.pyc
├── ess_6.1.3.1_0623-14_dme_ppc64le_binaries.iso
├── ess_6.1.3.1_0623-14_dme_ppc64le.tar
├── essmgr
├── essmgr_p8.yml
├── essmgr_p9.yml
├── essmgr.yml
├── essmkyml
├── logs
│ ├── essmgr.yml_2022-05-16_18-16-28
│ └── essmkyml_2022-05-16_18-16-28_log
├── podman_rh7_ppc64le.tgz
├── podman_rh8_ppc64le.tgz
├── python3_rh7_ppc64le.tgz
├── python3-site-packages_rh7_ppc64le.tgz
├── Release_note.ess_6.1.3.1_0623-14_dme_ppc64le.txt
├── rhels-7.9-server-extra.iso
├── rhels-7.9-server-ppc64le.iso
├── rhels-8.4-server-extra.iso
├── rhels-8.4-server-ppc64le.iso
└── rhels-8.4-server-x86_64.iso
Please type the desired and resolvable short hostname [ess5k-cems0]: cems0
• Remember that the IP address must belong to the 10.0.0.x/24 network block (It is assumed that
the recommended FSP network was used):
Note: The values in parentheses ([ ]) are just examples or the last entered values.
If all of the checks pass, the essmgr.yml file is written and you can proceed to bridge creation, if
applicable, and running the container.
Note: The preceding questions apply to ESS 3200, 3500, ESS 5000, and ESS Legacy on POWER9
EMS. If you are on the POWER8 EMS (ESS 3000 or ESS Legacy only), you are asked for the EMS
hostname, container name, and FSP bridge IP.
At this point, if all checks are successful, the image is loaded and container is started. Example:
9. Check and fix passwordless ssh in EMS and all nodes by using essutils.
10. Run the essrun config load command. This command determines the node information based
on VPD and also exchange the SSH keys.
Note:
• Always include the EMS in this command along with all nodes of the same type in the building-
blocks.
• Use the low-speed management host names. Specify the root password with -p.
• The password (-p) is the root password of the node. By default, it is ibmesscluster. Consider
changing the root password after deployment is complete.
• The config load command needs to be run with -N on all the nodes of the cluster to update
the hosts.yml file for BMC. If config load is run on the selected nodes only (For example, only
on essio nodes, then ems node), the hosts.yml file will be overwritten with latest nodes of config
load.
After this command is run, you can use -N NodeGroup for future essrun steps (For example, -N
ess_ppc64le). There are different node group names for ESS 3000 and ESS Legacy.
11. Run the essrun config check command. This command does a check of the various nodes
looking for potential issues prior to upgrade. Review the output carefully and make changes as
needed before proceeding.
Use these instructions if you are deploying a new cluster or a new file system.
Note: The POWER8 or POWER9 firmware is not automatically upgraded by the essrun automation. For
information about manually upgrading the server firmware, see Appendix G, “Upgrading the POWER9
firmware,” on page 81. You may use the essinstallcheck command to determine if a firmware
upgrade is required after upgrading to ESS 6.1.3.1.
Before you start with these steps, you must complete the steps in Chapter 3, “ESS common installation
instructions,” on page 21.
The following steps are covered in this topic:
• Upgrading the EMS and I/O nodes, if required.
• Creating network bonds.
• Creating the cluster.
• Adding the EMS node to the cluster.
• Creating the file system.
• Configuring performance monitoring and starting the GUI.
• Setting up call home.
• Setting up time server.
• Final health checks.
Note: You can update by using the management node names (management) or after the config load is
run, you can update by using a group of nodes. The groups are as follows:
• PPC64LE - ESS 5000 and ESS Legacy: ess_ppc64le
• x86_64 - ESS 3000: ess_x86_64
When the group is referenced in these instructions, ess_ppc64le is used as an example. If you are in an
ESS 3000 environment, use ess_x86_64.
For the EMS node, you can use the group ems.
At this point, the user has already determined whether an upgrade is required. If the version initially
found in /home/deploy on the EMS node is earlier than the latest available on IBM Fix Central, the latest
version should be already downloaded and deployed according to Chapter 3, “ESS common installation
instructions,” on page 21.
1. If an upgrade is required, upgrade the EMS node.
Please enter 'accept' indicating that you want to update the following list of nodes: ems1
>>> accept
Note:
• If the kernel is changed, you are prompted to leave the container, reboot the EMS node, restart the
container, and run this command again.
For example:
./essmgr -r
essrun -N ems1 update --offline
• You cannot upgrade a POWER8 EMS currently running ESS Legacy code (5.3.x with xCAT control)
from an ESS 3000 container. If xCAT is installed on the host, you must first uninstall it and clean up
any dependencies before attempting an EMS upgrade from the container. If ESS Legacy deployment
is needed, do not remove xCAT and deploy the ESS Legacy. Otherwise, remove xCAT and use the
container to upgrade EMS and I/O nodes.
Note: To check which nodes belong to certain nodeclasses, issue the following command:
• When you are updating the EMS, shut down the GUI. Do not start the GUI until you finish upgrading
the EMS. Because the GUI is shut down, any containers that access the ESS storage through the
REST API cannot access the storage temporarily.
2. If required, update the I/O nodes.
You can create bonds on I/O and EMS nodes at the same time:
ssh essio1
ESSENV=TEST essnettest -N essio1,essio2 --suffix=-hs
Note: After you SSH to an I/O node, you exit from the container
This command performs the test with an optional RDMA test afterward if there is InfiniBand. Ensure
that there are no errors in the output indicating dropped packets have exceeded thresholds. When
completed, type exit to return back to the container to create a cluster.
6. Create the cluster.
Note:
• By default, this command attempts to use all the available space. If you need to create multiple
file systems or a CES shared root file system for protocol nodes, consider by using less space. For
example:
For more options such as blocksize, filesystem size, or RAID code, see the essrun command in the
ESS Command Reference.
• This step creates combined metadata + data vdisk sets by using a default RAID code and block
size. You can use additional flags to customize or use the mmvdisk command directly for advanced
configurations.
• If you are updating ESS 3000, ESS 3200, and ESS 3500 the default set-size is 80% and it must
not be increased. If you are updating ESS 5000 and ESS Legacy, the default set-size is 100%. For
additional options, see essrun command. The default block size for PPC64LE is 16M whereas for ESS
3000 it is 4M.
• If you are deploying protocol nodes, make sure that you leave space for CES shared root file system.
Adjust the set-size slightly lower when you are creating this required file system for protocol nodes.
essinstallcheck -N localhost
Doing this step verifies that all software and cluster versions are up to date.
Note: For ESS 5000, a check is added that flags whether the WCE bit is enabled on any drive. If the
WCE bit is enabled, refer the published flash for the recommended action.
2. From EMS node, outside of the container, run the following final health check commands to verify your
system health.
gnrhealthcheck
mmhealth node show -a
Note: The POWER8 or POWER9 firmware is not automatically upgraded by the essrun automation. For
information about manually upgrading the server firmware, see Appendix G, “Upgrading the POWER9
firmware,” on page 81. You may use the essinstallcheck command to determine if a firmware
upgrade is required after upgrading to ESS 6.1.3.1.
Warning: You must have a clean and healthy system before starting any ESS upgrade (online or
offline). At least, the following commands must run free of errors when run on any node outside of
container:
gnrhealthcheck
mmhealth node show -a
mmnetverify -N all
You can also run the essrun healthcheck command instead, from inside the container.
Important: To upgrade the system from 6.0.x.x to 6.1.3.x, you first need to upgrade your system to
6.1.0.0, then you can upgrade the system to the desired version (6.1.3.x).
Upgrade can be done by using the following methods
• Offline upgrade: This method requires a given node or nodes to have GPFS shut down before beginning.
This method is faster than online update, in which nodes are upgraded in parallel including firmware,
but the system is typically taken down for a period of time.
• Online upgrade: This method allows the cluster to stay fully available and the code is typically updated
one node per building-block in parallel.
Note:
• The EMS node and protocol node upgrades are available only in the offline mode.
• Where NodeClass is an ESS 3000, ESS 5000, or ESS Legacy node class. For more information, see the
mmlsnodeclass command in IBM Spectrum Scale: Command and Programming Reference.
Online upgrade assumptions (I/O nodes only):
• The cluster is created with EMS, one or more ESS nodes, and optionally one or more ESS building blocks
or protocol nodes.
• The file system is built and recovery groups are active and healthy.
• GPFS is active on all ESS nodes and quorum is achieved.
• New container is installed that will update the code on the EMS and I/O nodes.
• GUI and collector services are stopped on the EMS before starting the upgrade.
Before starting the online upgrade, make sure that all ESS nodes are active by running the following
command from one of the cluster nodes:
mmgetstate -N NodeClass
Where NodeClass is your ESS 3000, ESS 5000, or ESS Legacy node class. For more information, see
mmlsnodeclass command.
Offline upgrade assumptions (EMS or protocol nodes only):
• You assume the risks of potential quorum loss.
• The GPFS GUI and collector must be down.
If kernel version changed during the update, you are prompted to exit the container, reboot, rerun the
container, and rerun the update command.
After the reboot and restarting the container, run the EMS node update again.
Note:
• You cannot upgrade a POWER8 EMS currently running ESS Legacy code (5.3.x with xCAT control)
from an ESS 3000 container. If xCAT is installed on the host, you must first uninstall it and cleanup
any dependencies before attempting an EMS upgrade from the container. Do not remove xCAT if
legacy deployment is not needed, typically only if you are moving to ESS Legacy 6.1.0.x container.
If you are still using an ESS Legacy deployment (5.3.x), update the EMS by using the upgrade
instructions outlined in ESS 5.3.x Quick Deployment Guide.
• When you are updating the EMS, shut down the GUI. Do not start the GUI until you finish upgrading
the EMS. Because the GUI is shut down, any containers that access the ESS storage through the
REST API cannot access the storage temporarily.
3. Update the protocol nodes.
4. Run installation check on each node type by logging in to EMS node and protocol nodes.
essinstallcheck
Add the --offline option, when you attempt an offline only upgrade.
Important: For doing an offline update, GPFS must be down in the ESS cluster. The GPFS status is
checked. If it is up on a given node, you are asked if it is OK to shut it down.
If you want to do an online update of I/O nodes, refer to “Update ESS I/O nodes online” on page 33.
• Update by using the group of all configured ESS nodes.
These command examples show ESS 5000 node and node classes, but you can use these commands
with any of the supported ESS node types, such as ESS 3200 and ESS 3500.
After offline update is done, proceed to starting GPFS on the nodes.
6. Run installation check on each node from outside the container.
essinstallcheck
Note: For ESS 5000, a check is added that flags whether the WCE bit is enabled on any drive. If the
WCE bit is enabled, refer the published flash for the recommended action.
7. Start GPFS on all nodes.
Note: If any protocol nodes are updated, ensure that you restart CES services on those nodes.
Note: Consider using the --serial option for online upgrade. This will allow you to perform an online
update one node at a time (or to the required level). For example:
The --serial 1 option is the default option, if the --serial 1 option is not specified.
The command performs an online update of two building blocks but one node at a time. See “Serial
option for online upgrade” on page 34 for details.
2. Run installation check on each updated node.
essinstallcheck
Note: For ESS 5000, a check is added that flags whether the WCE bit is enabled on any drive. If the
WCE bit is enabled, refer the published flash for the recommended action.
3. Change the autoload parameter to enable GPFS to automatically start on all nodes.
mmchconfig autoload=yes
gnrhealthcheck
mmhealth node show -a
mmnetverify -N all
The Ansible tool essrun cannot add more than one If it is necessary to add more than one building block
building block at a time in a cluster. in a cluster, the following two options are available:
During upgrade, if the container had an unintended Wait for the timeout and retry the essrun update
loss of connection with the target canister(s), there task.
might be a timeout of up to 2 hours in the Ansible
update task.
Product
• ESS Legacy
• ESS 3000
• ESS 3200
• ESS 3500
• ESS 5000
When running essrun commands, you might see This is a restriction in the Ansible timestamp module.
messages such as these: It shows timestamps even for the “skipped” tasks.
If you want to remove timestamps from the output,
Thursday 16 April 2020 20:52:44 +0000 change the ansible.cfg file inside the container as
(0:00:00.572) 0:13:19.792 ********
Thursday 16 April 2020 20:52:45 +0000 follows:
(0:00:00.575) 0:13:20.367 ********
Thursday 16 April 2020 20:52:46 +0000 1. vim /etc/ansible/ansible.cfg
(0:00:00.577) 0:13:20.944 ********
2. Remove ,profile_tasks on line 7.
Product 3. Save and quit: esc + :wq
• ESS Legacy
• ESS 3000
• ESS 3200
• ESS 3500
• ESS 5000
After reboot of an ESS 5000 node, systemd could be Power off the system and then power it on again.
loaded incorrectly.
1. Run the following command from the container:
Users might see the following error when trying to
start GPFS: rpower <node name> off
Failed to activate service 2. Wait for at least 30 seconds and run the following
'org.freedesktop.systemd1': command to verify that the system is off:
timed out
rpower <node name> status
Product
3. Restart the system with the following command:
• ESS 5000
rpower <node name> on
In ESS 5000 SLx series, after pulling a hard drive out Run the following command from EMS or IO node to
for a long time wherein the drive has finished draining, revive the drive:
Product
Where RGName is the recovery group that the drive
• ESS 5000 belongs to and PdiskName is the drive's pdisk name.
After the deployment is complete, if firmware on the The error about mmvdisk settings can be ignored.
enclosure, drive, or HBA adapter does not match the The resolution is to update the mismatched firmware
expected level, and if you run essinstallcheck, the levels on enclosure, adapter, or HBA adapters to the
following mmvdisk settings related error message is correct levels.
displayed:
You can run the mmvdisk configuration check
[ERROR] mmvdisk settings do NOT match best command to confirm.
practices.
Run mmvdisk server configure --verify --node- The mmvdisk settings do not match best practices.
class
ess5k_ppc64le_mmvdisk to debug. Run the mmvdisk server configure --verify
--node-class <nodeclass> command.
Product List the mmvdisk node classes: mmvdisk nc list
• ESS Legacy Note: essinstallcheck detects inconsistencies
• ESS 3000 from mmvdisk best practices for all node classes in
• ESS 3200 the cluster and stops immediately if an issue is found.
• ESS 3500
• ESS 5000
When running essinstallcheck you might see an Run vpdupdate on each I/O node.
error message similar to:
Rerun essinstallcheck which should properly
System Firmware could not be obtained query the firmware level.
which will lead to a false-positive
PASS message when the script completes.
During command-less disk replacement, there is a For command-less disk replacement using commands,
limit on how many disks can be replaced at one time. only replace up to 2 disks at a time. If command-less
disk replacement is enabled, and more than 2 disks
Product
are replaceable, replace the 1st 2 disks, and then
• ESS 3000 use the commands to replace the 3rd and subsequent
• ESS 3200 disks.
• ESS 3500
• ESS 5000
Issue reported with command-less disk replacement The replaceable disk will have the amber led on, but
warning LEDs. not blinking. Disk replacement should still succeed.
Product
• ESS 5000
After upgrading an ESS node to version, the After the ESS upgrade is complete, the pmsensors
pmsensors service needs to be manually started. service does not automatically start. You must
manually start the service for performance monitoring
The canister_failed event does not surface amber Root cause: The failed canister is not the master
LED on the canister or the enclosure LED front panel. canister, and the other canister is not up/running.
Product Action required: No
• ESS 3200
• ESS 3500
Migration from ESS Legacy releases (5.3.7.x) to the For more information about this issue, see IBM
container version (ESS 6.1.x.x) might revert values in Support.
the mmvdisk to default settings.
Product
• ESS Legacy
Node call home might not work for nodes that are Determine a power supply problem by manually
designated as protocol nodes. If a power supply (or inspecting the ASMI error/event logs by using FSP and
any other Opal related node problem) is damaged open problem with support if required.
or pulled, a call home will not be available on the
Salesforce system.
Opal PRD might not log error from FSP that is caussing
this issue.
Product
• ESS 5000
If the essrun gui –configure command is run If the GUI is already set up, it is not required to
after the GUI and performance monitoring is already remove the existing GUI config.
set up, you might get an error prompting you to
Exit the container.
remove any existing GUI config before continuing.
1. Run the mmhealth node show gui
Product
-acommand.
• ESS 3200
Verify that performance sensors and collectors are
• ESS 3500 healthy.
2. Verify that the gui daemon is started.
https://GUI_Node_IP
mmsysmoncontrol stop
/tmp/mmfs/callhome/incomingFTDC2CallHome/*
mmsysmoncontrol start
The mmcallhome ticket list still reports “New Case Remove the ticket.
Opened” after the PMR is closed by IBM.
mmcallhome ticket delete <ticket number
Product TSxxxxxxx>
• ESS 3500
After deploying the protocol VM on an ESS 3500 1. Manually run the ess_ofed postscript after the VM
canister the Mellanox OFED driver is not installed. is deployed.
Example: 2. Log in to the VM and run the following to install the
driver manually:
ofed_info -s
-bash: ofed_info: command not found /opt/ibm/ess/tools/postscripts/ess_ofed.essvm
Cannot create CES file system, if I/O nodes are Ansible tries to gather the RG by using the new name
deployed with versions prior to 6.1.2.0 by using the format. Example:
essrun command.
ess5k_essio1_ib_essio2_ib
Example of old naming convention:
• ess5k_7894DBA Create the CES file system by using the mmvdisk
command directly in the EMS or any I/O node in the
• ess5k_7894E4A cluster.
Product 1. Gather the desired RG name(s).
• ESS 3200
mmvdisk nc list
• ESS 3500
2. Define Vdiskset:
• ESS 5000
mmvdisk vs define
--vs vs_cesSharedRoot_essio1_hs_essio2_hs
--rg ess5k_7894DBA,ess5k_7894E4A
--code 8+2p --bs 4M --ss 20G
3. Create Vdiskset.
mmvdisk vs create
–vs vs_cesSharedRoot_essio1_hs_essio2_hs
mmvdisk fs create
--fs cesSharedRoot --vs
vs_cesSharedRoot_essio1_hs_essio2_hs
--mmcrfs -T /gpfs/cesSharedRoot
mmmount cesSharedRoot -a
During the file system creation in Mixed environments 1. Issue the following command only one time in the
(ESS 5000 and ESS 3500), the following error can container.
appear:
sed -i "s/enclQty/hostvars[item].enclQty/g"
TASK [/opt/ibm/ess/deploy/ansible/roles/ /opt/ibm/ess/deploy/ansible/roles/
mmvdiskcreate: mmvdiskcreate\
Define Vdiskset] ************** /tasks/create_filesystem_mixed.yml
BMC network may become unresponsive when 1. Log in to the canister corresponding to the BMC.
configured with VLAN.
2. Unconfigure the VLAN.
VLAN configuration failed to properly activate in the
BMC network stack. ipmitool lan set 1 vlan id off
Amber LED on the power supply may flash or turn 1. Contact Service if power supply presents false
solid without any amber LED in the front of the positive status.
enclosure.
2. Run the mmhealth command.
Power supply may incorrectly detect out of range
operating parameters such as incoming voltage or mmhealth node show NATIVE_RAID
power supply temperature.
If failure is real, it will show NATIVE_RAID-
Product >ENCLOSURE DEGRADED
• ESS 3500 3. Review the mmhealth command output for power
supply related issues.
“mv /tmp/mmfs/callhome/
Product incomingFTDC2CallHome/*
/tmp/unifiedCallhome”
• ESS 3500
mmsysmoncontrol start
“mv /tmp/unifiedCallhome/*
/tmp/mmfs/callhome/incomingFTDC2CallHome”
3. Run config load against the existing nodes and the new nodes.
5. Outside the container, run the essnettest command against the new nodes.
Run this command from one of the nodes in the cluster and use this command to test the health of
the high-speed network connections. For more information, see the essnettest command help.
6. Add new nodes to the existing cluster by submitting the following command from the container:
Note: Add only the same nodetype building blocks at a time. For example:
7. Create node class and recovery groups by submitting the following command from the container:
8. Get the list of recovery groups by submitting the following command outside the container:
mmvdisk rg list
Note: Assume that the existing file system has the attributes that are mentioned in the command.
For vdisk set name, use a unique name similar to the existing IBM Elastic Storage System vdisk
set.
For more information, see mmvdisk online command reference.
10. Add the vdisk set to an existing file system by submitting the following command:
mmrestripefs fs1 -b
cd /home/deploy/ess_6.1.3.1_0623-14_dme_ppc64le.dir
./essmgr -r
essrun -N essio3,essio4 gui --add-to-hosts
Note: Ensure that the passwordless ssh is configured from new nodes to EMS.
2. Check whether GUI web page shows new ESS nodes.
Proceed to add the new nodes as sensors, re-configure call home, and rerun the GUI wizard to identify the
new nodes.
4. To create network bond connections for the new nodes, issue the following command:
5. Run essnettest against the new nodes, ssh to essio3 and run the following command:
Run this command from one of the nodes in the cluster and use this command to test the health of
the high-speed network connections. For more information, see the essnettest command help.
6. Add new nodes to the existing cluster by submitting the following command from the container:
8. Get the list of recovery groups by submitting the following command (from one of the new nodes (i.e.
essio3):
mmvdisk rg list
cd /home/deploy/ess_6.1.3.1_0623-14_dme_ppc64le.dir
./essmgr -r
essrun -N essio3,essio4 gui --add-to-hosts
Note: Ensure that the passwordless ssh is configured from new nodes to EMS.
Note: Assume that the existing file system has the attributes that are mentioned in the command.
For vdisk set name, use a unique name similar to the existing IBM Elastic Storage System vdisk
set. Verify existing disk set names by listing vdisk sets.
b) Define the vdisk set by submitting the following command (it is an example and may vary in
different environment):
For more information, see mmvdisk online command reference.
10. Create the vdisk set by submitting the following command:
11. Add the vdisk set to an existing file system by submitting the following command:
cd /home/deploy/ess_6.1.3.1_0623-14_dme_ppc64le.dir
./essmgr -r
essrun -N essio3,essio4 gui --add-to-hosts
Note: Ensure that the passwordless ssh is configured from new nodes to EMS.
After you complete the procedure, do the following steps:
1. Reconfigure call home, for more information, see Configuring call home.
2. Rerun the GUI wizard to identify the new nodes.
From a web browser, login to the EMS over the management network and select Edit Rack
Components to rerun the wizard discovery.
podman ps -a
podman stop ContainerName
podman ps -a
podman rm ContainerName -f
podman images
podman image rm ImageID -f
nmcli c
ifup mgmt_bridge
ifup fsp_bridge
Note: An IP address must be set on the mgmt (management) and fsp (FSP) interfaces.
Disk call home for ESS 5000, ESS 3000, ESS 3200, ESS 3500, and
ESS Legacy
The IBM Spectrum Scale RAID pdisk is an abstraction of a physical disk. A pdisk corresponds to exactly
one physical disk, and belongs to exactly one de-clustered array within exactly one recovery group.
ESS Call Home to IBM
Electronic Service
Agent (ESA)
GNR Core
Disk Replace Events
RHEL
EMS Node
Callback Script
GNR Core
RHEL ESS 5000
I/O Server Node/Canister or
ESS 3000
or
Callback Script ESS 3200
GNR Core
bl8pdg028
RHEL
I/O Server Node/Canister
From ESS 3500, the unified call home is used to create service requests. Although ESA is deprecated and
will be removed, it is supported as a backup for the next releases.
If a cluster has ESS 3500 nodes, EMS nodes, and at least one ESS 3500 node, the unified call home is
used to create service tickets. For older nodes without any ESS 3500 nodes, ESA is used to create service
tickets.
# cd /install/ess/otherpkgs/rhels8/ppc64le/ess/
# yum install esagent.pLinux-4.5.7-0.noarch.rpm
After ESA is successfully configured, you need to configure ESS systems to generate call home events.
Entities or systems that can generate events are called endpoints. The EMS, I/O server nodes, and
attached enclosures can be endpoints in ESS. Servers and enclosure endpoints can generate events.
Server can generate hardware events which could be CPU, DIMM, OS Disk, etc. Typically, these events are
also logged in the OPAL log.
In ESS, ESA is only installed on the EMS node, and it automatically discovers the EMS as
PrimarySystem. The EMS node and I/O server nodes must be registered to ESA as endpoints.
The esscallhomeconf command is used to perform the registration task. The command also registers
enclosures attached to the I/O servers by default.
Software call home can also be registered based on the customer information given while configuring
the ESA agent. A software call home group auto is configured by default and the EMS node acts as the
software call home server. The weekly and daily software call home data collection configuration is also
activated by default. The software call home uses the ESA network connection settings to upload the data
to IBM. The ESA agent network setup must be complete and working for the software call home to work.
Activate and configure ESA and then configure call home as follows.
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 53
Configuration of ESA or unified call home
You can configure ESA or unified call home by using the following esscallhomeconf command. For
more information, see ESS CLI esscallhomeconf.
optional arguments:
-h, --help show this help message and exit
-E ESA-AGENT Provide nodename for esa agent node
--prefix PREFIX Provide hostname prefix. Use = between --prefix and
value if the value starts with -.
--suffix SUFFIX Provide hostname suffix. Use = between --suffix and
value if the value starts with -.
--verbose Provide verbose output
--esa-hostname-fqdn ESA_HOSTNAME_FQDN
Fully qualified domain name of ESA server for
certificate validation.
--stop-auto-event-report
Stop report of automatic event to ESA in case of any
hardware call home event reported to system.
-N NODE-LIST Provide a list of nodes to configure.
--show Show call home configuration details.
--register {node,all}
Register endpoints(nodes, enclosure or all) with ESA.
hardware callhome
--icn ICN Provide IBM Customer Number for Software callhome.
--serial SOLN-SERIAL Provide ESS solution serial number.
--model SOLN-MODEL Provide ESS model. Applicable only for BE (ppc64)
models.
--proxy-ip PROXY-HOSTNAME
Provides the IP address or the hostname for the proxy configuration.
--proxy-port PROXY-PORT
Provides the port number for the proxy configuration.
--proxy-userid PROXY-USERNAME
Provides the user ID for the proxy configuration.
--proxy-password PROXY-PASSWORD
Provides the password for the proxy configuration.
There are several switches which start with ESA_CONFIG that can be used with the --esa-config
switch of the esscallhomeconf command to activate ESA by using the CLI.
Attention: You can configure software call home without running the esscallhomeconf
command on the ESS system by using the mmcallhome command. However, it is recommended to
not enable software call home with mmcallhome. Instead, use the esscallhomeconf command
for this purpose on ESS systems including ESS 3000, ESS 3200, ESS 5000, and ESS Legacy 5.x
systems.
ESS 3500 installation, the GUI setup runs the esscallhomeconf command, and the ESA and the
unified call home are configured. To ensure the ESA and the unified call home are configured and
support earlier ESS versions, reconfigure the call home by using the esscallhomeconf command
(and not the ESA Web User Interface).
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 55
This example shows that ESA is activated and configured, and nodes including enclosures as part of single
command line argument are registered. Software call home is set up using the same command line.
If for some reason configuration of ESA is successful but configuration of call home fails, it can be done
separately by using the esscallhomeconf command.
Examples of successful output of test call home connectivity (essinstallcheck) from the ESA web user
interface for ESS 5000, ESS 3000, ESS 3200, and ESS Legacy (not ESS 3500):
--proxy-ip PROXY-IP-HOSTNAME
Provide hostname or IP for proxy
configuration.
--proxy-port PROXY-PORT
Provide port number for proxy
configuration.
--proxy-userid PROXY-USERNAME
Provide userid for proxy configuration.
Proxy configuration requires ESA or unified call home to be configured otherwise it will fail.
This is an example of configuring proxy along with the configuration inf ESA for call home events.
[I] ESA is activated but the configuration was not done. >>>> ESA was activated.
[I] Activating ESA via CLI using information provided by --esa-config switch
[I] Successfully activated the ESA with customer detail...
[I] Successfully setup Proxy >>> proxy set successfully.
[E] Unable to change the setting to stop automatic reporting of hardware event to ESA.
Connection Number 2
Type Proxy
Proxy IP address or host name host.example.com
Proxy port 5028
Destination user name johndoe
This is an example of configuring proxy after the successful configuration of ESA or unified call home for
call home events.
Connection Number 2
Type Proxy
Proxy IP address or host name host.example.com
Proxy port 5028
Destination user name johndoe
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 57
successfully with systemid 72fadb281627047372f9ada47ed2fcb4
Feb 23 10:27:05 essems1 /esscallhomeconf: [I] End point enclosure EB15094 registered
successfully with systemid b266c524642846255f38a493e99bf10a
Feb 23 10:27:05 essems1 /esscallhomeconf: [I] End point enclosure EB15090 registered
successfully with systemid 8f9a45df3eb6137f6890ab18cf4c2957
End point enclosure EB15090
Feb 23 10:27:28 essems1 /esscallhomeconf: [I] ESA configuration for ESS Call home is complete.
Feb 23 10:28:04 essems1 /esscallhomeconf: [I] Software callhome configuration completed.
Attention: The esscallhomeconf command also configures the IBM Spectrum Scale call home
setup. The IBM Spectrum Scale call home feature collects files, logs, traces, and details of certain
system health events from the I/O and EMS nodes and services running on those nodes. These
details are shared with the IBM support center for monitoring and problem determination. For
more information on IBM Spectrum Scale call home, see IBM Spectrum Scale documentation in
IBM Documentation.
Note: The ESS 3000, ESS 3200, and ESS 3500 hardware call home is backed by software call home. In
other words, software call home must be configured by using the esscallhomeconf command, without
the --no-swcallhome switch in the ESS 3000 or the ESS 3200 environment. Otherwise, the ESS 3000
or the ESS 3200 hardware failure events are not reported to ESA and a PMR does not get opened.
The endpoints are visible in the ESA portal after registration, as shown in the following figure:
Name
Shows the name of the endpoints that are discovered or registered.
SystemHealth
Shows the health of the discovered endpoints. A green icon (√) indicates that the discovered system is
working fine. The red (X) icon indicates that the discovered endpoint has some problem.
ESAStatus
Shows that the endpoint is reachable. It is updated whenever there is a communication between the
ESA and the endpoint.
SystemType
Shows the type of system being used. Following are the various ESS device types that the ESA
supports.
Detailed information about the node can be obtained by selecting System Information. Here is an
example of the system information:
When an endpoint is successfully registered, the ESA assigns a unique system identification (system id) to
the endpoint. The system ID can be viewed using the --show option.
For example:
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 59
"5ad0ba8d31795a4fb5b327fd92ad860c": "essio42-ce"
}
When an event is generated by an endpoint, the node associated with the endpoint must provide the
system id of the endpoint as part of the event. The ESA then assigns a unique event id for the event. The
system id of the endpoints are stored in a file called esaepinfo01.json in the /vpd directory of the
EMS and I/O servers that are registered. The following example displays a typical esaepinfo01.json
file:
# cat /vpd/esaepinfo01.json
{
"encl": {
"78ZA006": "32eb1da04b60c8dbc1aaaa9b0bd74976"
},
"esaagent": "ems4",
"node": {
"ems4-ce": "6304ce01ebe6dfb956627e90ae2cb912",
"essio41-ce": "a575bdce45efcfdd49aa0b9702b22ab9",
"essio42-ce": "5ad0ba8d31795a4fb5b327fd92ad860c"
}
}
The endpoints are visible in the ESA portal after registration. For more information, see IBM Spectrum
Scale call home documentation.
Figure 11. ESA portal showing enclosures with drive replacement events
The rest information in this section applies to ESA on ESS 5000, ESS 3000, and ESS 3200.
The following figure shows an example of the problem description.
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 61
Figure 12. Problem Description
Name
It is the serial number of the enclosure containing the drive to be replaced.
Description
It is a short description of the problem. It shows ESS version or generation, service task name and
location code. This field is used in the synopsis of the problem (PMR) report.
SRC
It is the Service Reference Code (SRC). An SRC identifies the system component area. For example,
DSK XXXXX, that detected the error and additional codes describing the error condition. It is used by
the support team to perform further problem analysis, and determine service tasks associated with
the error code and event.
Time of Occurrence
It is the time when the event is reported to the ESA. The time is reported by the endpoints in the UTC
time format, which ESA displays in local format.
Service request
It identifies the problem number (PMR number).
Service Request Status
It indicates reporting status of the problem. The status can be one of the following:
Open
No action is taken on the problem.
Pending
The system is in the process of reporting to the IBM support.
Failed
All attempts to report the problem information to the IBM support has failed. The ESA
automatically retries several times to report the problem. The number of retries can be
configured. Once failed, no further attempts are made.
Reported
The problem is successfully reported to the IBM support.
Closed
The problem is processed and closed.
Local Problem ID
It is the unique identification or event id that identifies a problem.
If an event is successfully reported to the ESA, and an event ID is received from the ESA, the node
reporting the event uploads additional support data to the ESA that are attached to the problem (PMR) for
further analysis by the IBM support team.
The callback script logs information in the /var/log/messages file during the problem reporting
episode. The following examples display the messages logged in the /var/log/message file generated
by the essio11 node:
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 63
• Callback script is invoked when the drive state changes to replace. The callback script sends an event
to the ESA:
• The ESA responds by returning a unique event ID for the system ID in the json format.
Call home monitoring of ESS 5000, ESS 3000, ESS 3200, and ESS Legacy
systems and their disk enclosures
A callback is a one-time event. Therefore, it is triggered when the disk state changes to replace. If
ESA misses the event, for example if the EMS node is down for maintenance, the call home event is not
generated by ESA.
Important: Information in this section is not applicable for ESS 3500, because ESS 3500 uses the unified
call home to monitor ESS systems and disk enclosures.
To mitigate this situation, the callhomemon.sh script is provided in the /opt/ibm/gss/tools/
samples directory of the EMS node. This script checks for pdisks that are in the replace state, and
sends an event to ESA to generate a call home event if there is no open PMR for the corresponding
physical drive. This script can be run on a periodic interval. For example, every 30 minutes.
In the EMS node, create a cronjob as follows:
1. Open crontab editor by using the following command:
# crontab -e
*/30 * * * * /opt/ibm/ess/tools/samples/callhomemon.sh
# crontab -l
*/30 * * * * /opt/ibm/ess/tools/samples/callhomemon.sh
The call home monitoring protects against missing a call home due to ESA missing a callback event. If a
problem report is not already created, the call home monitoring ensures that a problem report is created.
Note: When the call home problem report is generated by the monitoring script, as opposed to being
triggered by the callback, the problem support data is not automatically uploaded. In this scenario, the
IBM support can request support data from the customer.
Upload data
The following support data is uploaded when the ESS system disks in enclosures display a drive replace
notification on an ESS 5000, ESS 3000, ESS 3200, ESS 3500, or ESS Legacy system.
• The output of mmlspdisk command for the pdisk that is in replace state.
• Additional support data is provided only when the event is initiated as a response to a callback. The
following information is supplied in a .tgz file as additional support data:
– Last 10000 lines of mmfs.log.latest from the node which generates the event.
– Last 24 hours of the kernel messages (from journal) from the node which generates the event.
The following support data is uploaded when the system displays any hardware issue in an ESS 5000 or
an ESS Legacy system.
• The output of the opal_elog_parse command for the serviceable event that caused failure.
• Additional support data is provided only when the event is initiated as a response to a callback. The
following information is supplied in a .tgz file as additional support data:
– Last 10000 lines of mmfs.log.latest from the node which generates the event.
– Last 24 hours of the kernel messages (from journal) from the node which generates the event.
The following support data is uploaded when the system displays any hardware issue in an ESS 3000 or
an ESS 3200 system.
• The output of the mmhealth command and the actual component that caused failure.
• Additional support data is provided only when the event is initiated as a response to a callback. The
following information is supplied in a .tgz file as additional support data:
– Last 10000 lines of mmfs.log.latest from the node which generates the event.
– Last 24 hours of the kernel messages (from journal) from the node which generates the event.
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 65
Uninstalling, reinstalling, and troubleshooting the IBM Electronic
Service Agent
The ESA agent is preinstalled in the EMS node from the factory.
Issue the following command to remove the rpm if needed:
Issue the following command to reinstall the rpm files for the ESA agent:
/opt/ibm/esa/bin/verifyConnectivity -t
• ESA test call home - Test call home from the ESA portal. Go to All systems > System Health for the
endpoint from which you would like to generate a test call home. Click Send Test Problem' from the
newly opened Problems tab.
• ESS call home script setup to ensure that the callback script is set up correctly.
Verify that the periodic monitoring is set up.
crontab -l
[root@ems1 deploy]# crontab -l
*/30 * * * * /opt/ibm/ess/tools/samples/callhomemon.sh
esscallhomeevent--event test
esscallhomeevent--event test
Note: This applies to both ESA and unified call home and all flavors of ESS.
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 67
Post setup activities
Perform the following post setup activity.
• Delete any test problems.
# essinstallcheck -N localhost
Start of install check
nodelist: localhost
Getting package information.
[WARN] Package check cannot be performed other than on EMS node. Checking nodes.
================== Summary of node: localhost =============================
[INFO] Getting system firmware level. May take a long time...
[INFO] Getting system profile setting.
[INFO] Spectrum Scale RAID is not active, cannot get gpfs Installed
version:
[OK] Linux kernel installed: 4.18.0-193.79.1.el7.ppc64le
[ERROR] Systemd not at min recommended level: 239-31.el8_2.8.ppc64le
[ERROR] Networkmgr not at min recommended level: 1.22.8-9.el8_2.ppc64le
[OK] Mellanox OFED level: MLNX_OFED_LINUX-4.9-3.1.5.0
[OK] IPR SAS FW: 19512B00
[OK] ipraid RAID level: 10
[ERROR] ipraid RAID Status: found Degraded expected Optimized
[OK] IPR SAS queue depth: 64
[ERROR] System Firmware : found FW860.81 (SV860_215) expected min
FW860.90 (SV860_226)
[OK] System profile setting: scale
[OK] System profile verification PASSED.
[INFO] Cluster not yet created skipping rsyslog check
[OK] Host adapter driver: 34.00.00.00
Performing Spectrum Scale RAID configuration check.
[OK] New disk prep script: /usr/lpp/mmfs/bin/tspreparenewpdiskforuse
[OK] Network adapter MT4099 firmware: 16.27.2008, net adapter count: 3
[OK] Network adapter firmware
[INFO] Storage firmware check is not required as GPFS cluster does not exist.
[OK] Node is not reserving KVM memory.
[OK] IBM Electronic Service Agent (ESA) is activated for Callhome service.
[OK] Software callhome check skipped as cluster not configured.
End of install check
[PASS] essinstallcheck passed successfully
You can view two more lines in the essinstallcheck output (in bold face) which mention that ESA
is activated (ESA activation indicates that the hardware call home is also configured for this ESS) and
software call home has been configured for this node. This is a very important check which enables
customers to configure hardware and software call home after the cluster creation and the file system
creation is done.
Remember: Enable the hardware and the software call home at the end of the ESS system deployment
when the file system is active, nodes are ready to serve the file system, and none of the configuration is
pending.
General Information
Primary Contact for Providing Site Access:
(Required)
Country or Region:
(Required)
Note: You need to provide the Alpha-2 Code for
this entry. You can obtain this information by
searching for the country code on ISO Online
Browsing Platform.
State or Province:
(Required)
Postal Code:
(Required)
City:
(Required)
Street Address:
(Required)
Telephone Number:
(Required)
Street Address:
(Optional)
City:
(Optional)
State or Province:
(Optional)
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 69
Country or Region:
(Required)
Note: You need to provide the Alpha-2 Code for
this entry. You can obtain this information by
searching for the country code on ISO Online
Browsing Platform.
Postal Code:
(Optional)
Fax Number:
(Optional)
Telephone Number:
(Required)
E-mail Address:
(Required)
Pager Number:
(Optional)
Telephone Number:
(Required)
E-mail Address:
(Required)
Port:
(Required)
User ID:
(Optional)
Password:
(Optional)
Community:
(Required)
Port:
(Required)
IBM ID Settings
IBM provides customized web tools and functions that use information collected by IBM Electronic
Service Agent. The access to these functions is managed by an association between your IBM ID(s) and
the Electronic Service Agent information from your systems. The association is made using this page. To
obtain an IBM ID, which is used by many IBM web sites, go to http://www.ibm.com/registration
Appendix D. Configuring call home in ESS 5000, ESS 3000, ESS 3200, ESS 3500, and ESS Legacy 71
72 IBM Elastic Storage System: Quick Deployment Guide
Appendix E. Security-related settings in ESS
The following topics describe how to enable security related settings in ESS.
• “Working with firewall in ESS” on page 73
• “Working with sudo in ESS ” on page 75
• “Working with Central Administration mode in ESS” on page 76
# firewall-cmd --state
running
You can verify the open firewall ports by running firewall sub-command with the verify option.
When the command completes, the required ports in firewall are verified.
• Enable firewall on I/O server nodes by running the firewall sub-command with the enable option.
# firewall-cmd --state
running
You can verify the open firewall ports by running the firewall sub-command with the verify
option. When the command completes, the required ports in firewall are verified.
• Disable firewall on the EMS node by running the firewall sub-command with the disable option.
• Disable firewall on I/O server nodes by running the firewall sub-command with the disable
option.
Note: Make sure that you reboot the node when the selinux sub-command completes.
b) Reboot the node.
# systemctl reboot
# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
c) Rerun the selinux sub-command with the enable option to enforce SELinux.
# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
After SELinux is enabled, kernel logs any activity in the /var/log/audit/audit.log file.
• Enable SELinux on I/O server nodes as follows.
a) Run the selinux sub-command on the I/O server nodes.
Note: Make sure that you reboot the node when the selinux sub-command completes.
b) Reboot the I/O server nodes.
# systemctl reboot
Reboot the node after the command completes. When the node comes up after reboots, SELinux is
disabled.
You can check the status as follows.
# sestatus
SELinux status: disabled
• To disable SELinux on the I/O server nodes, use the following command.
Reboot the node after the command completes. When the node comes up after reboots, SELinux is
disabled. Any I/O server node name can also be used instead of the group name.
Additional information: Any mentioned security item is an optional feature and you can enable it
on demand for an ESS cluster. Security commands can be run using the essrun command after
deployment of the node is done and before creating the GPFS cluster. In upgrade cases, any such
security commands must be run after stopping the GPFS cluster. Do not attempt to run any security
command while GPFS cluster is up and running.
Container consideration: Make sure that none of the security command is run against the container
node. The container has a very light footprint of Red Hat Enterprise Linux 7.x OS on which any security
parameters are not supported.
Note: To configure a sudo user, see the essrun sudo command in the ESS Command Reference.
This command creates the gpfsadmin Linux user and gpfs Linux group on the node and performs all
necessary sudoers set up. For detailed information, see the /etc/sudoers.d/ess_sudoers file.
User can now log in to the node server using the gpfsadmin user and they can perform GPFS
administration tasks.
Make sure that the sudo sub-command is run on all GPFS nodes (EMS node, I/O server nodes, and any
client nodes) as part of the cluster to be completely compliant with the sudo requirement. Change the
node name in the sudo sub-command accordingly. Enabling sudo also allows the gpfsadmin user to
administer xCAT and the GPFS GUI on the EMS node.
Disabling sudo reverts the xCAT policy table to its previous state, deletes /etc/sudoers.d/
ess_sudoers file, and deletes the gpfsadmin user from the Linux node. Make sure that you have
disabled sudo user configuration on all GPFS nodes (EMS node, I/O server nodes, and any client nodes) as
part of the cluster to be completely compliant with the sudo requirement. Change the node name in the
sudo sub-command accordingly.
Important: You must not disable sudo user until the GPFS cluster is set to configure not to use sudo
wrapper and sudo user. Failing to do so might result in cluster corruption.
Note: You can disable root in all ESS nodes except ESS 3000.
Note: After running this command any future deployment of new nodes only have the adminMode
attribute set to central, by default. For existing nodes in the cluster, you must update the xCAT
security context by running the following command.
2. Update the xCAT security context using the updatenode Node -k script.
# updatenode gss_ppc64,ces_ppc64 -V -k
...
Password: <Type EMS node root Password here>
...
...
Note:
• If you do not run the updatenode Node -k command, the central administration mode gets
enabled for any new nodes deployed using the current EMS node. However, existing nodes can still
do passwordless SSH between each other.
• In case of an upgrade, if you want to enable the central administration mode then run the same
commands.
• Make sure that you do not run updatenode admin_node -V -k on the EMS node which is the
admin node.
• Running the admincentral sub-command against non-container nodes is not allowed. In other
words, with the -N option the container node name must be specified as an argument.
The admincentral sub-command can be run after the deployment of the EMS node, I/O server nodes,
or protocol nodes is completed.
Note: After running this command any future deployment of new nodes only have the central
administration mode disabled. For existing nodes in the cluster, you must update the xCAT security
context by running the following command.
2. Update the xCAT security context using the updatenode Node -k script.
# updatenode gss_ppc64,ces_ppc64 -V -k
...
Password: <Type EMS node root Password here>
...
...
Note:
• If you do not run the updatenode Node -k command, the central administration mode gets
disabled for any new nodes deployed using the current EMS node. However, existing nodes cannot
do passwordless SSH between each other.
• In case of an upgrade, if you want to disable the central administration mode then run the same
commands.
• Make sure that you do not run updatenode admin_node -V -k on the EMS node which is the
admin node.
• Running admincentral sub-command against non-container nodes is not allowed. In other words,
with the -N option the container node name must be specified as an argument.
positional arguments:
{enable, disable}
optional arguments:
-h, --help show this help message and exit
Note: If you need to set up a node as a server issue the following command:
This command requires EMS to have firewall enabled. Because NTP requests come from certain ports that
are blocked for EMS nodes by default.
cd /install/ess/otherpkgs/rhels8/ppc64le/firmware/
sftp EMSNode
mput 01VL950_092_045.img
update_flash -v -f 01VL950_092_045.img
update_flash -f 01VL950_092_045.img
The system restarts and the firmware is upgraded. This process might take up to 1.5 hours per node.
Note: If you plan to upgrade the POWER8 EMS firmware, you can retrieve the code
(01SV860_236_165.mg file) from the following location inside the container:
/install/ess/otherpkgs/rhels7/ppc64le/firmware/
Follow the same steps to upgrade the POWER8 EMS firmware. The level after the upgrade will be
SV860_236_165 (FW860.A2). The level after upgrade of the POWER9 firmware will be FW950.30
(VL950_092).
/lib/firmware/IBM-ST1200M.A1800017.39463233
/lib/firmware/IBM-ST1800M.A1800017.39463233
/lib/firmware/IBM-ST600MM.A1800017.39463233
Here,
ST1800MM0139
This is the product ID.
/dev/sg1 and /dev/sg2
These are the storage devices.
9F14
This is the device version.
5. Check whether the firmware file matches to the product ID.
6. If the firmware file and the product ID are matching, upgrade the firmware.
/lib/firmware/IBM-ST1800M.A1800017.39463233
9. After both devices are successfully updated, reboot the I/O node.
10. Check whether the firmware version is changed.
Note: The following time server setup documentation is for general reference. You can configure the time
server as suitable for your environment. In the simplest example, the EMS host is used as the time server
and the I/O nodes (or protocol nodes) are used as clients. Customers might want to have all nodes point
to an external time server. Use online references for more detailed instructions for setting up Chrony.
Chrony is the preferred method of setting up a time server. NTP is considered deprecated. Chrony uses
the NTP protocol.
For the following example steps, it is assumed that the EMS node is the chronyd server and there is no
public internet synchronization.
• Do the following steps on the EMS node, outside of the container.
a) Set the time zone and the date locally.
b) Edit the contents of the /etc/chrony.conf file.
Note: Replace the server and the allow range with the network settings specific to your setup.
• Do the following steps on the client nodes (canister nodes or ESS nodes).
a) Edit the contents of the /etc/chrony.conf file.
Note: Replace the server and the allow range with the network settings specific to your setup.
chronyc makestep
chronyc ntpdata
timedatectl
Prerequisites
• During file system creation, adequate space is available for CES shared root file system. For more
information, see “During file system creation, adequate space is available for CES shared root file
system” on page 85
• ESS container has the protocol node management IP addresses defined. For more information, see
“ESS container has the protocol node management IP addresses defined” on page 85.
• ESS container has the CES IP addresses defined. For more information, see “ESS container has the CES
IP addresses defined” on page 86.
During file system creation, adequate space is available for CES shared root file
system
In a default ESS setup, you can use the Ansible based file system task to create the recovery groups,
vdisk sets, and file system. By default, during this task, 100% of the available space is attempted to be
consumed. If you plan to include protocol nodes in your setup, you must leave enough free space for the
required CES shared root file system. Use the --size flag to adjust the space consumed accordingly.
For example: essrun -N ess_ppc64le filesystem --suffix=-hs --size 80%
Running this command leaves approximately 20% space available for the CES shared root file system or
additional vdisks. If you are in a mixed storage environment, you might not use the essrun filesystem
task due to more complex storage pool requirements. In that case, when using mmvdisk, make sure that
you leave adequate space for the CES shared root file system. The CES shared root file system requires
around 20 GB of space for operation.
ping IPAdress1,...IPAddressN
Each protocol node must respond to the ping test indicating they have an IP address set and it is on
the same subnet as the container.
2. Run the config load task.
If you have more than one node, you can specify them in a comma-separated list. Make sure that you
add all ESS nodes in this config load command before continuing.
3. Create network bonds.
Note: Make sure that the nodes are connected to the high-speed switch before doing this step.
5. Install IBM Spectrum Scale by using the installation toolkit and set up CES.
Use the IBM Spectrum Scale documentation for installing IBM Spectrum Scale by using the installation
toolkit and for enabling the required services for the customer environment. For more information, see:
• Using the installation toolkit to perform installation tasks: Explanations and examples.
• Adding CES HDFS nodes into the centralized file system.
• ESS awareness with the installation toolkit.
Hardware requirements
The hardware requirements to run protocol nodes as a VM on an I/O node are as follows:
• 8 CPU cores
• 64 GB of RAM
• 64 GB of /essvm file system
These requirements are provided out of the box with ESS 3500.
Protocol VM implementation
The deployment of protocol binaries and services remains the same by using the IBM Spectrum Scale
install toolkit.
Ensure that the following prerequisites are met before you deploy and use a protocol VM in the ESS 3500
environment:
• ESS 3500 nodes are completely deployed.
• A CES shared root file system is created.
See the ESS Quick Deployment Guide to deploy and create a CES shared root file system in the ESS 3500
or any ESS environment.
In the essrun command, an argument cesvm is introduced. The cesvm argument accepts the node name
as a requirement for the --create and --delete options with several physical Mellanox cards, which
the protocol VM uses.
Note: Only one protocol VM can be run on a single I/O node canister. Therefore, for one building block of
ESS 3500, you can run two VMs per canister node.
The protocol VM implementation only supports full functional protocol deployment of NFS and SMB
protocols. You cannot run the Object protocol in the protocol VM environment because of limited number
of CPU cores and RAM assigned to a VM. There is no such restriction to run the Object protocol, but it
might impact the performance of the VMs and produce unexpected results.
The protocol VM implementation does not support a highly scalable NFS and SMB workload. However,
this implementation supports small NFS and SMB workloads.
Tip: If you want to run a massive workload by using an NFS protocol and an SMB protocol, you can use the
Power 64 LE version of the protocol node implementation instead the VM implementation.
For more information about POWER 64 node protocol node and implementation, contact IBM support.
Network designs
Before you set up the protocol VM, you must decide on its network aspect. Usually, ESS 3500 nodes come
with two or three number of, two ports per adapter, Mellanox Ethernet or Mellanox Infiniband cards.
To run a protocol VM, you must assign one of the Mellanox network adapters (2 ports) to the VM that
provides the following connections:
• CES (first port of the network adapter) connection
• GPFS admin or daemon (second port of the network adapter) connection
In the following figures, two types of network connections are shown:
Where,
cesvm
The cesvm argument in the essrun command can either use the --create or --delete option to
create or delete a VM.
-h, --help
Shows the help message and exit.
--create
Creates a CES VM on an I/O node.
--delete
Deletes a CES VM on an I/O node.
--vm-name
Shows a user-defined VM name. Default VM is <Node Name>-essvm. You can optionally specify the
name of a VM by using the this option. However, do not provide a custom VM name unless required.
--number-of-mellanox-device-passthru 1, 2
Provides the number of Mellanox cards, minimum 1 card and maximum 2 cards, that you want to pass
in a VM. Second and third cards are passed to the VM with respective number of card value.
This option pushes either 1 or 2 Mellanox network cards (each card has two network interfaces) to the
VM depending on the use case. For more information about the network behavior when you pass 1 or
2 cards by using this switch, see Figure 20 on page 90 and Figure 21 on page 90. When this option is
set, the network card on a PCI root device is picked up automatically and pushed to the VM.
PCI root addresses 0000:c0, 0000:80, and 0000:00 are reserved for network cards.
If you select the --number-of-mellanox-device-passthru 1 option, the 0000:00 PCI root
network card is added to the VM. Similarly, if you select the --number-of-mellanox-device-
passthru 2 option, 0000:00 and 0000:80 PCI root network cards are added to the VM. The 0000:c0
PCI root address is reserved for the host network card.
Example
This example shows how to create or delete a VM.
• Create a VM.
/opt/ibm/ess/tools/postscript/ess_instnic.essvm
yum update
/opt/ibm/ess/tools/posctscripts/ess_ofed.essvm
When a VM is deleted by using –delete command, the network interfaces are returned to the host.
Note: After a VM is created or deleted, you can run the virsh command to manage the VM.
For more information about the essrun command, see this command in the ESS Command Reference.
Prerequisites
• SSR has completed code 20 on both the ESS 3000 and ESS 5000 nodes (including EMS)
SSR works on Power® nodes and the EMS node first, then the ESS 3000 system.
• Public connection setup on C11-T3 (f3 connection on EMS)
• ESS 3000 and ESS 5000 nodes have been added to /etc/hosts
– Low-speed names FQDNs , short names, and IP addresses
– High-speed names FQDNs, short names, and IP addresses (add suffix of low-speed names)
• Host name and domain set on EMS
• Latest code for ESS 3000 and ESS 5000 stored in /home/deploy on EMS
• For information on how to deploy the ESS system, see ESS 3000 Quick Deployment Guide.
• For information on using the mmvdisk command, see mmvdisk in ESS documentation.
Note: This command creates a combined data and metadata vdisk in the system pool. The file system
name must be fs3k.
Type exit and press Enter to exit the container. Proceed with the instructions on how to setup the
collector, sensors, and run the GUI wizard.
The status of the current ESS 3000 container should be exited. To confirm, use the podman ps -a
command. For example:
If the ESS 3000 container is not in the stopped state, use the podman stop ContainerName command.
Note: If you plan to add protocol nodes in the cluster, include them in the list of nodes that you are
specifying in this command.
2. Update the nodes.
Note:
• Use the high-speed names.
• If there is an error, you might need to log in to each ESS 5000 node and start GPFS.
mmbuildgpl
mmstartup
Note: The full list of nodetypes you can add to any environment are (for usage with --nodetype):
• For EMS node, use ems.
• For Legacy ESS IO ppc64le Server node, use ess5x.
• For ESS 3000, use ess3k.
• For ESS 5000, use ess5k.
• For ESS 3200, use ess3200.
• Default is ems.
Type exit and press Enter to exit the container. Running these commands, takes you to the ESS
5000 node.
5. Create mmvdisk artifacts.
a. Create the node class.
d. Define vdiskset.
mmvdisk vs define --vs vs_fs5k_1 --rg ess5k_rg1,ess5k_rg2 code 8+2p --bs 16M --ss 80%
--nsd-usage dataOnly --sp data
e. Create vdiskset.
Note: You need to understand the implications of this rule before applying it in your system.
When capacity on ESS 3000 reaches 75%, it migrates files (larger ones first) out of the system
pool to the data pool until the capacity reaches 25%.
h. On the EMS node, run the following command.
At this point, add the ESS 5000 nodes to the pmsensors list and use the Edit rack components option in
the GUI to slot the new nodes into the frame.
If you want to add protocol nodes, see Appendix I, “ESS protocol node deployment by using the IBM
Spectrum Scale installation toolkit,” on page 85.
# /bin/mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
[warn] mst_pciconf is already loaded, skipping
Create devices
Unloading MST PCI module (unused) - Success
4. Convert the P1 port of the device listed in the preceding command to Ethernet from InfiniBand.
5. Reboot they node and query the port type of all attached devices again.
You can run the following to see configuration parameter settings without setting them:
/usr/lpp/mmfs/samples/gss/gssClientConfig.sh -D
After running this script, restart GPFS on the affected nodes for the optimized configuration settings to
take effect.
Important: Do not run gssClientConfig.sh unless you fully understand the impact of each setting on
the customer environment. Make use of the -D option to decide if all or some of the settings might be
applied. Then, individually update each client node settings as required.
Supported paths
SL models (5U92)
• SL1 -> SL2
• SL2 -> SL3
• SL2 -> SL4
• SL3 -> SL4
• SL3 -> SL5
• SL4 -> SL5
• SL4 -> SL6
• SL5 -> SL6
• SL5 -> SL7
• SL6 -> SL7
SC models (4U106)
• SC1 -> SC2
• SC2 -> SC3
• SC2 -> SC4
• SC3 -> SC4
• SC3 -> SC5
• SC4 -> SC5
• SC4 -> SC6
• SC5 -> SC6
• SC5 -> SC7
• SC6 -> SC7
• SC6 -> SC8
• SC7 -> SC8
• SC7 -> SC9
• SC8 -> SC9
Prerequisites
1. All new or existing building blocks must be at ESS 6.1.0.0 or later. If there are protocol nodes in the
setup, they must also be upgraded to the matching ESS version.
2. If space needs to be made, for example for moving of the EMS, this has to be planned for accordingly.
3. LBS must wear an ESD wrist band when physically working on the hardware (like plugging in SAS
cables).
SSR tasks
SSR is responsible for the following tasks.
1. Code 20 of the new enclosures - replacing parts as needed.
2. Running or labeling the new SAS cable connections.
3. Potentially making space in the frame - Moving the EMS.
SSR is not responsible for checking system health using essutils like in a rackless or a rackful solution.
LBS tasks
LBS is responsible for the following tasks.
1. Upgrade of ESS 6.1.0.0 - prior to the capacity upgrade engagement.
2. Post capacity upgrade health checks.
3. Plugging the SAS cables into the adapters and enclosures.
4. Performing capacity upgrade software functions such as conversion and resizing.
5. New storage management functions such as adding new space to existing file system and creating a
new file system.
6. Restriping the file system.
7. Replacing any bad parts such as disks or cables.
8. Pre and post engagement operations
Flow
TDA process ensures that the customer is prepared for the capacity upgrade. Considerations such as if
there is enough room in the rack or usage of the file system space are planned out.
LBS
1. LBS performs normal ESS software upgrade. Customer must be at ESS 6.1.0.0 for the capacity
upgrade. This upgrade is treated as a separate engagement than the future capacity upgrade
operation.
Running mmvdisk recoverygroup list should show both RGs actively managed by
essio2-hs.
6. Plug in the SAS cables for essio1 on the server and enclosure ends. Shut down GPFS, only on the
server just modified, and then reboot the I/O node. Wait for 5 minutes for the node to reboot and
paths to be rediscovered. Run the following commands to ensure that essio1 has discovered the
new enclosures.
Note: Before shutting down GPFS, make sure that autoload is turned off (mmchconfig
autoload=no).
a. essstoragequickcheck -N localhost
b. essfindmissingdisks -N localhost
Both commands should return with no issues and recognize the new enclosure and disk counts.
The paths should also be without error. After this is complete, start IBM Spectrum Scale on the
node in question by using mmstartup. After determining that IBM Spectrum Scale is active by
using mmgetstate proceed to the next step.
7. Move the recovery group ownership to essio1-hs. Use the same commands as used in this step
but make sure to use the correct node name (essio1-hs).
After the preceding steps are complete, new enclosures have been successfully cabled to both
servers, proceed with the following final steps.
8. Rebalance both recovery groups by running from any node in the storage cluster.
a. mmvdisk rg list
b. mmvdisk recoverygroup change --recovery-group rg1 --active essio1-hs
c. mmvdisk recoverygroup change --recovery-group rg2 --active essio2-hs
d. Check that the ownership has changed using the mmvdisk recoverygroup list
command.
9. Perform the system verification steps again before proceeding.
10. Update enclosure and drive firmware. If there are any issues, you should stop and replace any
disks or enclosures that could not be updated for some reason.
CurrentIoServer implies running the command from either of I/O server nodes in the building
block.
Note: It might take up to an hour for the firmware upgrade to complete. You might notice that the
fan starts to run at high speed. This is a known issue.
a. CurrentIoServer$ mmchfirmware --type storage-enclosure
b. CurrentIoServer$ mmchfirmware --type drive
c. mmhealth node show -N all --verbose- This command shows any system health
related issues to address. (Run from any node in the storage cluster.)
d. gnrhealthcheck - This command determines if there are any issues in various areas of ESS.
Any problems that show up must be addressed before capacity upgrade starts.
11. Add new storage into recovery groups.
mmvdisk rg resize –-rg rg_essio1-hs,rg_essio2-hs -v no
12. Verify that the new storage is available and the DA is rebalancing.
13. Start up the GUI and use Edit rack components to have the GUI discover the new topologies and
make changes to the frame accordingly. Changes such as modify ESS model to consume more U
space, move EMS, and so on.
14. Reconfigure call home.
At this point, discussions with the customers need to occur on what to do with the free space.
1. Add to the existing file system?
a. See the add building block flow in ESS 5.3.x Quick Deployment Guide for tips on creating new
NSDs and adding to an existing file system.
b. See the add building block flow (Appendix B, “Adding additional nodes or building block(s),” on
page 45) for tips on creating new NSDs and adding to an existing file system.
c. Consider file system restripe at the end which might take time. (mmrestripefs FileSystem -b)
2. Create a new file system.
• See the installation section on how to use essrun on creating a new file system from inside the
container. You may also use mmvdisk commands directly to perform this operation.
ETHERNET PORTS 1- 12
FOR ESS 3200
SMN/C11/T2 T2
SMN/C11/T1
The system displays the 11S serial number similar to the following:
01FT690YA50YD7BGABX
4. Change the default password to the 11S password by using the following command:
cumulus@accton-1gb-mgmt:~$ passwd
5. Log in through SSH or console and log in with the new 11S password to validate the changes.
Note: The default password must be set to the 11S serial number 01FT690YA50YD7BGABX. If not,
the password must be CumulusLinux!.
Mgmt0
Console Port
USB Port
sudo su -
9. Copy the contents of the interface file to the file name /etc/network/interfaces and save the
file.
Note: You can use vi or modify this file.
10. Reload the interfaces by using the following command:
root@cumulus:/etc/network# ifreload -a
root@cumulus:/etc/network# ifquery-a
12. If required, set switch network. It is recommended to set a static IP to log remotely on the switch. For
example, 192.168.45.0/24 network IP switch 192.168.45.60, gateway 192.168.45.1.
• net add interface eth0 IP address 192.168.45.60/24
• net add interface eth0 IP gateway 192.168.45.1
• net pending
# Set tag
/bin/ipmitool lan set 1 vlan id 101
# Confirm tag
/bin/ipmitool lan print 1 | grep -i 'VLAN ID'
auto swp10
iface swp10
bridge-pvid 102
bridge-vids 101
Any ports that you designate as IBM Elastic Storage System ports need to have this configuration.
Consult the default IBM Elastic Storage System interfaces file for more information.
4. Copy the new interfaces file to the switch.
5. Reload and verify the interfaces.
6. Set the VLAN tags on the IBM Elastic Storage System canisters.
# Bridge setup
Goal
The goal is to enable the customer or SLS to swap out all the POWER8 ESS nodes in the cluster with the
new POWER9 (5105-22E) nodes without taking the cluster or the file system down.
High-level flow
Example environment:
• 1 x POWER8 EMS
• 2 x POWER8 ESS GLxC building-blocks
– Each building block is in its own failure group
– Metadata replication between failure groups
Appendix P. Replacing all POWER8 nodes in an environment with POWER9 nodes in online mode 119
120 IBM Elastic Storage System: Quick Deployment Guide
Appendix Q. EMS network card port assignment
The ports on POWER8 and POWER9 EMS node network card need to be assigned as follows.
The resources, which would be typically needed to communicate by using TCP/IP are saved and can be
used for real workload of the applications.
RoCE Setup
You can deploy a RoCE environment by using one of the following methods:
• The simplest way is to use one adapter port per node that is connected to the network. It is achieved by
using the same port for TCP/IP daemon communication and RoCE traffic concurrently.
• You can use multiple ports that are connected to a network. Use one port for TCP/IP daemon
communication and RDMA in parallel and all the other ports for RDMA and RoCE traffic. You can scale
out bandwidth capabilities of a node to the targeted numbers by using this configuration. It is more
complex but powerful and flexible configuration method for deploying a RoCE environment.
• Mellanox and IBM together introduce a method to configure a network bond, consisting out of two ports
from the same adapter to protect against cable, port, or switch issues.
Note: For the best performance result and reliability, configure your network as a lossless network.
Advantages and disadvantages to use or avoid bonded configuration are discussed later.
Network requirements
For running RDMA over Ethernet, you can get the best performance when the network is configured as
a lossless network. Depending on the vendor, components, and the topology, the requirement for the
lossless network can quickly become complex and is out of scope.
For more information, see Network configuration examples for RoCE deployment.
Make sure that all the nodes in your cluster have recent Mellanox OFED driver that is installed properly.
The ESS IO server nodes are maintained by an IBM Spectrum Scale deployment. You can check the
installed version by using the ofed_info -scommand as shown in the following example:
# ofed_info -s
MLNX_OFED_LINUX-x.x.x.x.x
Note: The minimum level of OFED version is documented in the release notes of Elastic Storage Server.
Ensure that the IBM Spectrum Scale client nodes run the same MOFED level as the NSD and ESS.
However, in many projects and environments, it is a challenge to maintain all nodes with the same MOFED
level.
It is possible that the client nodes are running the OFED software, which is distributed by the operating
system. Such configurations are simple to operate and to maintain in the client clusters. However, in cases
of trouble and network glitches, such configurations can cause unexpected failures.
Network topology
The acronym host is used for an endpoint in the network. It can be an IBM Spectrum Scale client system
or an ESS and NSD system.
A host’s configuration depends on the number of network ports that are used or needed. The number of
ports a node connects with the network depends on the expected bandwidth, the number of subnets that
are needed to get access to other environments and high availability requirements.
In TCP/IP environments, scaling bandwidth with multiple ports is commonly acquired with bonding
network ports, which are known as link aggregation. But bonding can have some challenging
complexities. It makes the deployment complex from the network perspective.
Depending on using a bond, it is considered that the LACP link aggregation needs to be set in the network.
Alternatively, by using IBM Spectrum Scale and ESS building blocks, you can also rely on higher HA layers
in IBM Spectrum Scale such as NSD and recovery group server fail over and can consider skipping bonds
in your topology to make the network set up less complex. For better performance and less complex
setup, it is recommended to use a configuration without bond.
For better availability, use bonded configuration. An example for a configuration without bond is shown in
the following figure.
As shown in the figure, for network topology without bond, it is essential to have one IP address per
network port. On the adapter that has the mmfsd IP address that is configured, additional RoCE IP
address nor alias is needed until nodes in your cluster are able to communicate to this mmfsd IP address.
You can configure as many aliases as the operating system version supports. You need one IP per adapter
and RoCE can also use the existing IP, which is also used for TCP/IP traffic.
Running a configuration without bonds allows to later in the GPFS configuration, one can enhance the
whole configuration by fabric numbers. By using such configuration, traffic on the ISL (inter-switch links)
might be avoided.
Note: The adapters in a RoCE enabled environment can run TCP/IP traffic and RDMA traffic
simultaneously.
MTU consideration
RDMA was first introduced on InfiniBand networks. RDMA over InfiniBand supports MTU (maximum
transmission unit) sizes from 256 up to 4096 Bytes.
It is recommended to adjust the MTU to jumbo frames, which is 9000 Bytes. Adjust this setting on all
adapters and all nodes, communicating in your clusters. If you need to communicate with external nodes
in remote networks and you cannot be sure that the path through the network supports MTU 9000 Bytes
end–to–end, make sure that MTU Path Discovery is enabled. The MTU Path Discovery is enabled
by default on Red Hat Enterprise Linux.
Tip: MTU can be set for the bond only when it is being getting created by using the --create-bond
switch. Once the bond has been created, the MTU cannot be modified by using the essgennetwork
command. The essgennetwork command does not support to reconfigure the MTU for an existing bond
or an interface. A user must use the nmcli command or manually edit the ifcfg network configuration
file and change the MTU value. Once MTU changes are applied, reload the new connection configuration
by using the nmcli config load command and restart the bond or the interface by using the ifdown
<interface_name> command followed by ifup <intreface_name>.
CLI Reference
The CLI reference for essgennetworks is shown as follows:
# essgennetworks --help
usage: essgennetworks [-h] -N NODE-LIST [--prefix PREFIX] [--suffix SUFFIX]
[--interface INTERFACE] [--assignip ASSIGNIP]
[--create-bd | --delete-bond | --add-slave]
optional arguments:
-h, --help show this help message and exit
-N NODE-LIST Provide a list of nodes for bonded network creation.
--prefix PREFIX Provide hostname prefix. Use = between --prefix and
value if the value starts with -.
--suffix SUFFIX Provide hostname suffix. Use = between --suffix and
value if the value starts with -.
--interface INTERFACE
Provide list of interfaces for bond. Default to all
high speed interface if not provided.
--assignip ASSIGNIP Assign IP address to provide interface in --interface
switch.
--create-bond Create bonded interface.
--delete-bond Delete bonded interface.
--add-slave Add slave interfaces to bond.
--gateway GATEWAY Provide gateway for the network. By default it will
not configure any gateway on network interface until
specified.
--bond BONDNAME Provide name of the bond. Default bond0.
--miimon MIIMON Provide miimon value for bond. Default is miimon=100.
Consider miimon=1000 if you are planning to use bond
for RoCE.
--vlan VLAN Set VLAN_ID for the interface or bond.
--mode {balance-rr,active-backup,balance-xor,broadcast,802.3ad,balance-tlb,balance-alb}
Provide bonding mode. Default is 802.3ad
(recommended).
--hash-policy {layer2+3,layer3+4}
Provide xmit hash policy for 802.3ad and balanced-xor.
Default is layer2+3.
--netmask CRID Provide CIDR (netmask) for the interface default /24.
--IPoIB Enable IPoIB (IP over InfiniBand, in case InfiniBand
network present. By default False. It will also
enabler RDMA for InfiniBand.
--query Query the port type of the Mellanox Interface.
--enableRDMA Enable RDMA over the InfiniBand Network.
--enableRoCE Enable RoCE over the Ethernet Network.
--configureRouteForRoCE
Configure routing for RoCE over the Ethernet
Network.Will be used if same subnet has been used for
different RoCE interfaces.
--roceRoutingTableId ROCEROUTINGTABLEID
Routing table ID for the RoCE over the Ethernet
Network.
--roceRoutingTableName ROCEROUTINGTABLENAME
Routing table Name for the RoCE over the Ethernet
Network.
--verbsPortsFabric VERBSPORTSFABRIC
Name of the Mellanox verbs port fabric. For Example: 1
or 2. It will be automatically added to the verbs
port.
--devices DEVICES Name of the Mellanox device name. 'all' will query all
devices attached to node. Provide comman separated
device names to query mode than one device at a given
time.
--change {InfiniBand,Ethernet}
change the Mellanox port type to InfiniBand or
Ethernet and vice versa.
--port {P1,P2} Port number of the Mellanox VPI card.
--mtu {1500,2044,4092,9000}
Provide mtu of bond network. For Ethernet, 1500 or
9000 MTU allowed (Default: 1500). For InfiniBand, 2044
or 4092 MTU allowed (Default: 2044).
--verbose Provides more verbosity.
Note: GPFS daemon must be recycled to make the RoCE configuration working. Once daemon
recycled, you can run the mmdiag –network command to check whether the RDMA is enabled over
Ethernet.
Advantages and disadvantages of using bond with RoCE:
• A bonded interface protects against port and cable failures.
• For running RDMA, all ports of the bonded interface need to be on one, the same, physical PCI adapter.
So, an adapter failure is not covered such configurations.
• Creating bonds over multiple switches makes MLAG configuration mandatory in a network, which can
cause unbalanced network utilization in the fabric.
As highlighted in black, make sure that only one IP interface has an IP address for the mmfs daemon
communication. All other interfaces need to be on a different subnet.
1. Assign IPv4 to high-speed interface by using the following command:
2. If you are using only one subnet, no need to enable routing. For multiple networks in same subnet,
must configure routing to route the RoCE traffic by using all subnets. To enable routing for a subnet,
see the “Routing configurations” on page 133 section. In the following example, only one subnet
192.168.4.0/24 is used. Therefore, you can enable RoCE by using the –enableRoCE command with
the –interface switch. However, it is safe to enable routing even only one subnet is in use, by issuing
the following command:
Note: If you want to enable RoCE for bond and interfaces both, then use –interface and –bond
switch with the –enabelRoCE to enable it.
GPFS daemon must be recycled to make the RoCE configuration working. Once daemon recycled you
can run the mmdiag –network command to check whether the RDMA is enabled over Ethernet.
Tip: The multiple verbsPortsFabric configuration is not supported as of now by using the
essgennnetwork command. For example, if a user wants to configure fabric for bond as "/1/1" and
for interfaces "/1/2", then the essgennetwork command does not have the option to configure it. The
user needs to use the mmchconfig verbsPorts="mlx5_bond_0/1/1 mlx5_2/1/2 mlx5_3/1/2
command to enable the multiple fabrics for RoCE. GPFS daemon must be recycled after the mmchconfig
verbsPorts parameter is used.
Routing configurations
In Figure 31 on page 132, the mmfsd communication runs in the 192.168.12.0/24 network. All other
Mellanox cards interfaces are highlighted in green and configured to be in the subnet 10.10.10.x/24. All
interfaces are intended to use for RDMA communication, while only the IP addresses in 192.168.12.0/24
network are used for TCP/IP communication.
With current IBM Spectrum Scale releases, only one IP address per node is supported for communication
to the mmfs daemon's network.
To scale out over multiple ports, RoCE can be used. But according to the definition of OFED standards,
each RDMA interface needs to have an IP address to maintain the connection.
Theoretically, to use multiple interfaces, need to configure multiple subnets. It is complex for larger
environments and technically not needed. It is possible to configure all interfaces that are intended to be
used for RDMA into one separate subnet.
# ip r
10.111.222.100/30 dev enP1p8s0f3 proto kernel scope link src 10.111.222.101 metric 101
linkdown
192.168.2.0/24 dev bond0 proto kernel scope link src 192.168.2.51 metric 300
Repeat same steps for all the interface to configure multiple RoCE interfaces.
Note: sysctl settings:
In addition to the interface scripts, you need system-wide settings that are managed by sysctl.
Depending on the ESS version in use, a customized sysctl setting is available by a tuned profile, which is
named as scale. You need to edit the file /etc/tuned/scale/tuned.conf. For any other client node in
your cluster, you can deploy the same sysctl configuration file.
Make sure the following sysctl settings are applied:
net.ipv6.conf.all.disable_ipv6=0
net.ipv6.conf.default.disable_ipv6=0
Note: The IPv6 configuration maybe enabled by the default ESS deployment process.
Validation:
Run mmlsconfig to validate the verbsPorts configuration:
# mmlsconfig
[ems5-ce,essio51-ce,essio52-ce]
verbsRdma enable
verbsRdmaCm enable
verbsRdmaSend yes
[ems5-ce,essio52-ce]
verbsPorts mlx5_bond_0
[essio51-ce]
verbsPorts mlx5_2
In addition, you can run the mmdiag –network and check for the verbsRDMA connection.
2. To check whether nodes are configured, issue the following command on EMS:
3. After a system is added to the console server, do not use the IPMI console to connect the serial over
LAN (SOL) to configured nodes. Access the nodes by issuing the following command:
In addition to access the console from EMS, the EMS keeps a file for each node with all the output of each
server on /var/log/goconserver/nodes directory. By issuing the following commands, you can get
log files:
# pwd/var/log/goconserver/nodes
# ls -ltrtotal 640
b. To create GUI user (from EMS host, not from the container), issue the following command from EMS
host:
For more information about GUI configuration, see Chapter 4, “ESS new deployment instructions,”
on page 27.
2. Rerun GUI config (can need to wipe the GUI database clean prior).
# cleanup GUI db
#!/bin/bash
echo "Stopping GUI..."
systemctl stop gpfsgui
echo "Cleaning database..."
psql postgreruns postgres -c "drop schema fscc cascade;"
echo "Cleaning CCR files..."
mmccr fdel _gui.settings
mmccr fdel _gui.user.repo
mmccr fdel _gui.keystore_settings
mmccr fdel _gui.policysettings
mmccr fdel _gui.dashboards
mmccr fdel _gui.notification
mmccr fdel gui_jobs
mmccr fdel gui
echo "Cleaning local CCR files..."
rm -f /var/lib/mmfs/gui/*.json*
echo "Cleaning logs..."
rm -rf /var/log/cnlog/mgtsrv/*
echo "Starting GUI..."
systemctl start gpfsgui
echo "Finished"
Starting a container
1. Populate /etc/hosts correctly.
2. Fix passwordless between all clusters or ESS nodes (manually).
3. Extract the installation package.
4. Start the container.
5. Issue the following command on all ESS nodes:
You can create network bonds manually from any ESS node by using by the essgennetworks
command. For more information, see the essgennetworks command, in the Elastic Storage System:
Command Reference.
4. Create a GPFS cluster.
• Ensure that you use only I/O nodes (ESS 5000, ESS 3000, ESS 3200, or ESS 3500). Do not use EMS
or protocol nodes.
• You can use the --suffix=YourSuffix option in the command.
5. Check whether any nodes have any issues and fix them.
• Ensure that you use only one node in the cluster for the -N option.
• You can use the --suffix=YourSuffix option in the command.
7. Create a file system.
• Enure that you use only I/O nodes (ESS 5000, ESS 3000, ESS 3200, or ESS 3500).
• You can use the --suffix=YourSuffix option in the command.
For more information, see the essrun command, in the Elastic Storage System: Command Reference.
You can use multiple flags in the command. For example, --bs/--code/--suffix.
Ensure that you use the same BlockSize and RaidCode as an existent file system.
You can use --extra-vars “--nsd-usage dataOnly --storage-pool system” or any
other --nsd-usage/--storage-pool combination.
Ensure that only one building block is supported to create vdisk.
b. Execute in oneESSIONodeInCluster:
• Ensure that you use only I/O nodes (ESS 5000, ESS 3000, ESS 3200, or ESS 3500).
• You can use the --suffix=YourSuffix option in the command.
For more information, see the essrun command, in the Elastic Storage System: Command Reference.
6. Create recovery groups only in the new building block(s).
• Ensure that you use only I/O nodes (ESS 5000, ESS 3000, ESS 3200, or ESS 3500).
• You can use the --suffix=YourSuffix option in the command.
For more information, see the essrun command, in the Elastic Storage System: Command Reference.
3. Update ESS.
Accessibility features
The following list includes the major accessibility features in IBM Spectrum Scale RAID:
• Keyboard-only operation
• Interfaces that are commonly used by screen readers
• Keys that are discernible by touch but do not activate just by touching them
• Industry-standard devices for ports and connectors
• The attachment of alternative input and output devices
IBM Documentation, and its related publications, are accessibility-enabled.
Keyboard navigation
This product uses standard Microsoft Windows navigation keys.
IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:
Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21,
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS"
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain
transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in
any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of
the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:
IBM Corporation
Dept. 30ZA/Building 707
Mail Station P300
2455 South Road,
Poughkeepsie, NY 12601-5400
U.S.A.
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment or a fee.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
"Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
The registered trademark Linux is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Red Hat and Ansible are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the
United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
148 Notices
IBM Privacy Policy
At IBM we recognize the importance of protecting your personal information and are committed to
processing it responsibly and in compliance with applicable data protection laws in all countries in which
IBM operates.
Visit the IBM Privacy Policy for additional information on this topic at https://www.ibm.com/privacy/
details/us/en/.
Applicability
These terms and conditions are in addition to any terms of use for the IBM website.
Personal use
You can reproduce these publications for your personal, noncommercial use provided that all proprietary
notices are preserved. You cannot distribute, display, or make derivative work of these publications, or
any portion thereof, without the express consent of IBM.
Commercial use
You can reproduce, distribute, and display these publications solely within your enterprise provided
that all proprietary notices are preserved. You cannot make derivative works of these publications, or
reproduce, distribute, or display these publications or any portion thereof outside your enterprise, without
the express consent of IBM.
Rights
Except as expressly granted in this permission, no other permissions, licenses, or rights are granted,
either express or implied, to the Publications or any information, data, software or other intellectual
property contained therein.
IBM reserves the right to withdraw the permissions that are granted herein whenever, in its discretion, the
use of the publications is detrimental to its interest or as determined by IBM, the above instructions are
not being properly followed.
You cannot download, export, or reexport this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS
ARE PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT,
AND FITNESS FOR A PARTICULAR PURPOSE.
Notices 149
150 IBM Elastic Storage System: Quick Deployment Guide
Glossary
This glossary provides terms and definitions for the IBM Elastic Storage System solution.
The following cross-references are used in this glossary:
• See refers you from a non-preferred term to the preferred term or from an abbreviation to the spelled-
out form.
• See also refers you to a related or contrasting term.
For other terms and definitions, see the IBM Terminology website (opens in new window):
http://www.ibm.com/software/globalization/terminology
B
building block
A pair of servers with shared disk enclosures attached.
BOOTP
See Bootstrap Protocol (BOOTP).
Bootstrap Protocol (BOOTP)
A computer networking protocol that is used in IP networks to automatically assign an IP address to
network devices from a configuration server.
C
CEC
See central processor complex (CPC).
central electronic complex (CEC)
See central processor complex (CPC).
central processor complex (CPC)
A physical collection of hardware that consists of channels, timers, main storage, and one or more
central processors.
cluster
A loosely-coupled collection of independent systems, or nodes, organized into a network for the
purpose of sharing resources and communicating with each other. See also GPFS cluster.
cluster manager
The node that monitors node status using disk leases, detects failures, drives recovery, and selects
file system managers. The cluster manager is the node with the lowest node number among the
quorum nodes that are operating at a particular time.
compute node
A node with a mounted GPFS file system that is used specifically to run a customer job. ESS disks are
not directly visible from and are not managed by this type of node.
CPC
See central processor complex (CPC).
D
DA
See declustered array (DA).
datagram
A basic transfer unit associated with a packet-switched network.
DCM
See drawer control module (DCM).
E
Elastic Storage System (ESS)
A high-performance, GPFS NSD solution made up of one or more building blocks. The ESS software
runs on ESS nodes - management server nodes and I/O server nodes.
encryption key
A mathematical value that allows components to verify that they are in communication with the
expected server. Encryption keys are based on a public or private key pair that is created during the
installation process. See also file encryption key (FEK), master encryption key (MEK).
ESS
See Elastic Storage System (ESS).
environmental service module (ESM)
Essentially, a SAS expander that attaches to the storage enclosure drives. In the case of multiple
drawers in a storage enclosure, the ESM attaches to drawer control modules.
ESM
See environmental service module (ESM).
F
failback
Cluster recovery from failover following repair. See also failover.
failover
(1) The assumption of file system duties by another node when a node fails. (2) The process of
transferring all control of the ESS to a single cluster in the ESS when the other clusters in the ESS fails.
See also cluster. (3) The routing of all transactions to a second controller when the first controller fails.
See also cluster.
failure group
A collection of disks that share common access paths or adapter connection, and could all become
unavailable through a single hardware failure.
FEK
See file encryption key (FEK).
file encryption key (FEK)
A key used to encrypt sectors of an individual file. See also encryption key.
file system
The methods and data structures used to control how data is stored and retrieved.
file system descriptor
A data structure containing key information about a file system. This information includes the disks
assigned to the file system (stripe group), the current state of the file system, and pointers to key files
such as quota files and log files.
G
GPFS cluster
A cluster of nodes defined as being available for use by GPFS file systems.
GPFS portability layer
The interface module that each installation must build for its specific hardware platform and Linux
distribution.
GPFS Storage Server (GSS)
A high-performance, GPFS NSD solution made up of one or more building blocks that runs on System
x servers.
GSS
See GPFS Storage Server (GSS).
H
Hardware Management Console (HMC)
Standard interface for configuring and operating partitioned (LPAR) and SMP systems.
HMC
See Hardware Management Console (HMC).
I
IBM Security Key Lifecycle Manager (ISKLM)
For GPFS encryption, the ISKLM is used as an RKM server to store MEKs.
independent fileset
A fileset that has its own inode space.
indirect block
A block that contains pointers to other blocks.
inode
The internal structure that describes the individual files in the file system. There is one inode for each
file.
Glossary 153
inode space
A collection of inode number ranges reserved for an independent fileset, which enables more efficient
per-fileset functions.
Internet Protocol (IP)
The primary communication protocol for relaying datagrams across network boundaries. Its routing
function enables internetworking and essentially establishes the Internet.
I/O server node
An ESS node that is attached to the ESS storage enclosures. It is the NSD server for the GPFS cluster.
IP
See Internet Protocol (IP).
IP over InfiniBand (IPoIB)
Provides an IP network emulation layer on top of InfiniBand RDMA networks, which allows existing
applications to run over InfiniBand networks unmodified.
IPoIB
See IP over InfiniBand (IPoIB).
ISKLM
See IBM Security Key Lifecycle Manager (ISKLM).
J
JBOD array
The total collection of disks and enclosures over which a recovery group pair is defined.
K
kernel
The part of an operating system that contains programs for such tasks as input/output, management
and control of hardware, and the scheduling of user tasks.
L
LACP
See Link Aggregation Control Protocol (LACP).
Link Aggregation Control Protocol (LACP)
Provides a way to control the bundling of several physical ports together to form a single logical
channel.
logical partition (LPAR)
A subset of a server's hardware resources virtualized as a separate computer, each with its own
operating system. See also node.
LPAR
See logical partition (LPAR).
M
management network
A network that is primarily responsible for booting and installing the designated server and compute
nodes from the management server.
management server (MS)
An ESS node that hosts the ESS GUI and is not connected to storage. It must be part of a GPFS cluster.
From a system management perspective, it is the central coordinator of the cluster. It also serves as a
client node in an ESS building block.
master encryption key (MEK)
A key that is used to encrypt other keys. See also encryption key.
N
Network File System (NFS)
A protocol (developed by Sun Microsystems, Incorporated) that allows any host in a network to gain
access to another host or netgroup and their file directories.
Network Shared Disk (NSD)
A component for cluster-wide disk naming and access.
NSD volume ID
A unique 16-digit hexadecimal number that is used to identify and access all NSDs.
node
An individual operating-system image within a cluster. Depending on the way in which the computer
system is partitioned, it can contain one or more nodes. In a Power Systems environment,
synonymous with logical partition.
node descriptor
A definition that indicates how ESS uses a node. Possible functions include: manager node, client
node, quorum node, and non-quorum node.
node number
A number that is generated and maintained by ESS as the cluster is created, and as nodes are added
to or deleted from the cluster.
node quorum
The minimum number of nodes that must be running in order for the daemon to start.
node quorum with tiebreaker disks
A form of quorum that allows ESS to run with as little as one quorum node available, as long as there
is access to a majority of the quorum disks.
non-quorum node
A node in a cluster that is not counted for the purposes of quorum determination.
O
OFED
See OpenFabrics Enterprise Distribution (OFED).
OpenFabrics Enterprise Distribution (OFED)
An open-source software stack includes software drivers, core kernel code, middleware, and user-
level interfaces.
P
pdisk
A physical disk.
Glossary 155
PortFast
A Cisco network function that can be configured to resolve any problems that could be caused by the
amount of time STP takes to transition ports to the Forwarding state.
R
RAID
See redundant array of independent disks (RAID).
RDMA
See remote direct memory access (RDMA).
redundant array of independent disks (RAID)
A collection of two or more disk physical drives that present to the host an image of one or more
logical disk drives. In the event of a single physical device failure, the data can be read or regenerated
from the other disk drives in the array due to data redundancy.
recovery
The process of restoring access to file system data when a failure has occurred. Recovery can involve
reconstructing data or providing alternative routing through a different server.
recovery group (RG)
A collection of disks that is set up by ESS, in which each disk is connected physically to two servers: a
primary server and a backup server.
remote direct memory access (RDMA)
A direct memory access from the memory of one computer into that of another without involving
either one's operating system. This permits high-throughput, low-latency networking, which is
especially useful in massively-parallel computer clusters.
RGD
See recovery group data (RGD).
remote key management server (RKM server)
A server that is used to store master encryption keys.
RG
See recovery group (RG).
recovery group data (RGD)
Data that is associated with a recovery group.
RKM server
See remote key management server (RKM server).
S
SAS
See Serial Attached SCSI (SAS).
secure shell (SSH)
A cryptographic (encrypted) network protocol for initiating text-based shell sessions securely on
remote computers.
Serial Attached SCSI (SAS)
A point-to-point serial protocol that moves data to and from such computer storage devices as hard
drives and tape drives.
service network
A private network that is dedicated to managing POWER8 servers. Provides Ethernet-based
connectivity among the FSP, CPC, HMC, and management server.
SMP
See symmetric multiprocessing (SMP).
Spanning Tree Protocol (STP)
A network protocol that ensures a loop-free topology for any bridged Ethernet local-area network. The
basic function of STP is to prevent bridge loops and the broadcast radiation that results from them.
T
TCP
See Transmission Control Protocol (TCP).
Transmission Control Protocol (TCP)
A core protocol of the Internet Protocol Suite that provides reliable, ordered, and error-checked
delivery of a stream of octets between applications running on hosts communicating over an IP
network.
V
VCD
See vdisk configuration data (VCD).
vdisk
A virtual disk.
vdisk configuration data (VCD)
Configuration data that is associated with a virtual disk.
Glossary 157
158 IBM Elastic Storage System: Quick Deployment Guide
Index
A O
accessibility features 145 overview
audience xi of information xi
C P
call home patent information 147
5146 system 51 preface xi
5148 System 51
background 51
overview 51
R
problem report 60 resources
problem report details 61 on web xiv
Call home
monitoring 64
Post setup activities 68 S
test 66
submitting xv
upload data 65
comments xv
T
D trademarks 148
troubleshooting
documentation
call home 51, 53, 66
on web xiv
call home data upload 65
call home monitoring 64
E Electronic Service Agent
problem details 61
Electronic Service Agent problem report creation 60
configuration 57 ESA 53, 57, 66
Installing 53 Post setup activities for call home 68
Reinstalling 66 testing call home 66
Uninstalling 66
W
I
web
IBM Spectrum Scale documentation xiv
call home resources xiv
monitoring 64
Post setup activities 68
test 66
upload data 65
Electronic Service Agent 53, 66
ESA
configuration 57
create problem report 60, 61
problem details 61
information overview xi
L
license inquiries 147
N
notices 147
Index 159
160 IBM Elastic Storage System: Quick Deployment Guide
IBM®
SC27-9859-02