X400 Boot Drive Replacement Guide
X400 Boot Drive Replacement Guide
Isilon
X400
February, 2016
CAUTION
If this procedure is not followed accurately, data loss and severe disruption of cluster
operations can occur. Perform every step in this procedure; if the system does not
respond as expected, contact Isilon Technical Support.
CAUTION
Perform this procedure on only one node at a time. Performing maintenance on multiple
nodes in parallel may lower the protection level of the cluster, put data at risk, and lead
to the interruption of client workflows.
However, if you are on the sudoers list, the following command succeeds:
Compliance mode commands that require changes beyond the sudo prefix are noted in
the procedure steps.
For more information on the sudo program and compliance mode commands, see the
OneFS CLI Administration Guide.
Gather logs
Before you begin any maintenance on a cluster, gather cluster logs.
You must collect cluster logs before all maintenance procedures. Cluster logs provide
snapshots of the cluster, which you can review to make sure that maintenance is
successful.
Procedure
1. Open a secure shell (SSH) connection to any node in the cluster and log in.
Note
See the Considerations for installing the latest drive support package section in order
to select the appropriate variant of the package. If you are unable to download the
package, contact EMC Isilon Technical Support for assistance.
3. Open a secure shell (SSH) connection to any node in the cluster and log in.
4. Create or check for the availability of the directory structure /ifs/data/
Isilon_Support/dsp.
5. Copy the downloaded file to the dsp directory through SCP, FTP, SMB, NFS, or any
other supported data-access protocols.
6. Unpack the drive support package by running the following command:
tar -zxvf Drive_Support_<version>.tgz
Note
You must run the isi_dsp_install command to install the dirve support package.
Do not use the isi pkg command.
3. Place the FRU package on the cluster through a network drop, or by asking someone
at the cluster site to place the package for you. If neither of these options is available
to you, contact Isilon Technical Support for assistance.
The boot drives are listed in the left column. In the previous example, both boot drives
are healthy. If one of the boot drives has failed, the drive will not appear in the output.
Make note of whether the failed boot drive is the ada1 or ada0 device, and then use
the following table to determine the location of the boot drive inside the node.
Master ada1 J4
Slave ada0 J3
Make note of the board drive slot that contains the failed boot drive. The ada1 drive is
on the right side of the boot carrier card. The ada0 drive is on the left side of the boot
carrier card.
CAUTION
If both drives appear to have failed, do not continue. Contact Isilon Technical Support
immediately.
ATA channel 0:
Master: no device present
Slave: no device present
ATA channel 1:
Master: no device present
Slave: no device present
ATA channel 2:
Master: ad4 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0
II
Slave: no device present
ATA channel 3:
Master: no device present
Slave: ad7 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0
II
ATA channel 4:
Master: no device present
Slave: no device present
ATA channel 5:
Master: no device present
Slave: no device present
The boot drives are listed under ATA channel 2 (master) and ATA channel 3
(slave). In the previous example, both boot drives are healthy. If one of the boot drives
has failed, the display reads no device present for that drive.
Make note of whether the failed boot drive is the ad4 or ad7 device, and then use the
following table to determine the location of the boot drive inside the node.
Master ad4 J3
Slave ad7 J4
Make note of the board drive slot that contains the failed boot drive.
CAUTION
If both drives appear to have failed, do not continue. Contact Isilon Product Support
immediately.
3. If both drives appear to be healthy, one of the drives may have partially failed. To
identify a partially failed drive, check the status of the individual partition mirrors by
typing the following command:
gmirror status
From left to right, the output displays the name of each mirror, the status of the mirror
relationship, and the component IDs for each boot drive.
The following example shows the boot drive partition layout in a healthy node. The
mirrors for each partition show:
l a value of COMPLETE in the Status column.
l the component IDs for both boot drives in the Components column. The
component IDs are a combination of the OneFS Drive ID, and the partition number
(the number following the letter p). Both boot drives are listed for each mirror with
the exception of the var-crash mirror, which only lists the slave drive.
Note
If you are running OneFS 8.0 or later, your OneFS Drive IDs will display as ada0 or
ada1.The partition numbers in the display may differ from the following example.
The following example shows the boot drive partition layout as it appears in the event
of a failed boot drive. A failed boot drive forces the mirrors for a partition to show:
l a value of DEGRADED in the Status column.
l only the component ID of the healthy boot drive in the Components column. The
failed boot drive does not appear.
Attention
DEGRADED does not refer to a specific drive, but to the mirror relationship between
the drives. If a drive appears in the Components column next to the DEGRADED status,
it is healthy and should not be removed.
In the previous example, ad7p4 is missing from the degraded partition mirror/
root0, and ad7p6 is missing from the degraded partition mirror/var0. The
missing drive, ad7, is the failed drive.
Determine which drive has failed. Use the previous table to determine which board
drive slot contains the failed boot drive and make a note of the number (J3 or J4).
Attention
If both drives have failed, do not continue. Contact Isilon Product Support.
3. From the node that you connected to, open a secure shell (SSH) connection to the
node that is to be shut down by typing the command:
ssh <node_ip_address>
If the node does not respond to the shutdown command, press the Power button on
the node three times, and then wait five minutes. If the node still does not shut down,
you are at risk for losing data. Do not proceed. Contact EMC Isilon Technical support
for assistance.
CAUTION
A forced power down should be attempted only if a node is unresponsive. Forcing the
power down of a healthy node can result in data loss.
5. Verify that the node is powered down by typing the following command:
isi status -q
Confirm that the node has a status of D--R (Down, Read Only). See node 3 in the
following example.
Note
If there are transceivers connected to the end of your IB or ethernet cables, make sure
to remove them with the cables. If you are using fiber ethernet cables, you will need to
disconnect the cable from the transceiver, then remove the transceiver from the node.
DANGER
Slide the node out from the rack slowly. Do not extend the rails completely until you
confirm that the node is latched and safely secured to the rails.
WARNING
Properly ground yourself to prevent electrostatic discharge from damaging the node. For
example, attach an ESD strap to your wrist and the node chassis.
Procedure
1. Loosen the captive screw that secures the node top panel.
2. Slide the top panel toward the rear of the node, and then lift the top panel to access
the node interior.
2. Remove the cross bracket by pressing on the side of the node chassis where the cross
bracket is connected. Unhook the cross bracket from the chassis, then lift straight up
to unhook the other side of the bracket.
1. Boot drive
Procedure
1. Locate the two board drive slots that contain the boot drives. The slots are labeled J3
and J4. Gently pull the failed boot drive from the board drive slot.
1. J3 connector 2. J4 connector
2. Insert the replacement boot drive into the empty boot drive slot and gently press
down to secure the drive.
WARNING
The cross bracket sits directly above the boot drives. Use caution when installing the
cross bracket so that the boot drives are not dislodged or damaged.
WARNING
The chassis intrusion switch can be damaged if the top panel is slid too far back on
the node.
2. Tighten the captive top panel screw to secure the top panel to the node.
WARNING
Slide the node slowly so you do not slam the node into the rack and damage the
node.
2. Reconnect the ethernet, InfiniBand, and power cables to the back of the node.
3. Secure the node to the rack cabinet.
4. Replace the node front panel.
If the sentinel file appears, you replaced the correct boot drive. If the file is missing,
do not continue. Contact Isilon Product Support.
2. Remove the file by typing the following command:
rm /sentinel.txt
Note
If you are running OneFS 8.0 or later, your OneFS Drive IDs will display as ada0 or
ada1.The partition numbers in the display may differ from the following example.
Confirm that the values in the Status column all read COMPLETE.
2. Verify boot drive information. Depending on your version of OneFS, run one of the
following commands:
OneFS 8.0 or later
camcontrol devlist | grep ad
ATA channel 0:
Master: no device present
Slave: no device present
ATA channel 1:
Master: no device present
Slave: no device present
ATA channel 2:
Master: ad4 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0
II
Slave: no device present
ATA channel 3:
Master: no device present
Slave: ad7 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0
II
ATA channel 4:
Master: no device present
Slave: no device present
ATA channel 5:
Master: no device present
Slave: no device present
Note
If your cluster is running in SmartLock compliance mode with OneFS 7.0.2.10 or later,
7.0.1.4 or later, or 7.1.1.0 or later you will need to enter the provided compliance mode
commands to run the FRU scripts. If your cluster is running in compliance mode but is not
running one of these versions, you will need to upgrade your OneFS version to support
the compliance mode commands. Contact Isilon Technical Support.
The update is verified and a series of status messages confirm the node configuration,
and if an FTP connection is available, an updated ABR is sent to Isilon Technical
Support.
2. If an external connection is not available, manually collect and deliver to Isilon
Technical Support the updated ABR.
3. If the cluster is running in SmartLock compliance mode, verify installation of the
updated hardware by running the following command:
sudo /usr/bin/isi_hwtools/isi_cto_update --abr --filepath .
Note
Generate an ABR
You can manually send an As Built Record (ABR) by copying an XML file from the node
and emailing the file to Isilon Technical Support. You need network access to the node, or
you can request that the customer provide the file to you.
Procedure
1. Generate an ABR by running the following command:
isi_make_abr
3. Place the ABR file where you can copy it by running the following command:
isi_inventory_tool --display --itemType asbuilt > /ifs/
asbuilt_ <serial-number>_<date-time-stamp> .xml
2. Delete the FRU package from the node. Depending on your version of OneFS, run one
of the following commands:
OneFS 8.0 or later
isi upgrade patches uninstall IsiFru_Package_ <date-time-
stamp>
Note
Do not restart or power off nodes while drive firmware is being updated on the cluster.
Procedure
1. Open a secure shell (SSH) connection to any node in the cluster and log in.
2. Depending on your version of OneFS, run one of the following commands to update
the drive firmware for your cluster:
OneFS 8.0 or later
To update the drive firmware for your entire cluster, run the following command:
isi devices drive firmware update start all --node-lnn
all
To update the drive firmware for a specific node only, run the following
command:
isi devices drive firmware update start all --node-lnn
<node-number>
CAUTION
You must wait for one node to finish updating before you initiate an update on the
next node. To confirm that a node has finished updating, run the following command:
isi devices -d <node-number>
A drive that is still updating will display a status of FWUPDATE.
isi devices
isi devices
Gather logs
After you complete maintenance on a cluster, gather cluster logs.
You must collect cluster logs after all maintenance. Cluster logs provide snapshots of the
cluster that you can review to make sure that maintenance is successful.
Procedure
1. Gather cluster logs by typing the command:
isi_gather_info
Help with Online For questions specific to EMC Online Support registration or access,
Support email support@emc.com.
Isilon Info Hubs For the list of Isilon info hubs, see the Isilon Info Hubs page on the
EMC Isilon Community Network. Isilon info hubs organize Isilon
documentation, videos, blogs, and user-contributed content into
topic areas, making it easy to find content about subjects that
interest you.
Copyright © 2016 EMC Corporation. All rights reserved. Published in the USA.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.
The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software
license.
EMC², EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries.
All other trademarks used herein are the property of their respective owners.
For the most up-to-date regulatory document for your product line, go to EMC Online Support (https://support.emc.com).