0% found this document useful (0 votes)
25 views214 pages

t5440 Service en 02

Uploaded by

Vicente Junior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views214 pages

t5440 Service en 02

Uploaded by

Vicente Junior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 214

SPARC Enterprise T5440 Server

TM

Service Manual

Manual Code C120-E512-02EN


Part No. 875-4392-11
July 2009, Revision A
Copyright © 2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.
FUJITSU LIMITED provided technical input and review on portions of this material.
Sun Microsystems, Inc. and Fujitsu Limited each own or control intellectual property rights relating to products and technology described in
this document, and such products, technology and this document are protected by copyright laws, patents and other intellectual property laws
and international treaties. The intellectual property rights of Sun Microsystems, Inc. and Fujitsu Limited in such products, technology and this
document include, without limitation, one or more of the United States patents listed at http://www.sun.com/patents and one or more
additional patents or patent applications in the United States or other countries.
This document and the product and technology to which it pertains are distributed under licenses restricting their use, copying, distribution,
and decompilation. No part of such product or technology, or of this document, may be reproduced in any form by any means without prior
written authorization of Fujitsu Limited and Sun Microsystems, Inc., and their applicable licensors, if any. The furnishing of this document to
you does not give you any rights or licenses, express or implied, with respect to the product or technology to which it pertains, and this
document does not contain or represent any commitment of any kind on the part of Fujitsu Limited or Sun Microsystems, Inc., or any affiliate of
either of them.
This document and the product and technology described in this document may incorporate third-party intellectual property copyrighted by
and/or licensed from suppliers to Fujitsu Limited and/or Sun Microsystems, Inc., including software and font technology.
Per the terms of the GPL or LGPL, a copy of the source code governed by the GPL or LGPL, as applicable, is available upon request by the End
User. Please contact Fujitsu Limited or Sun Microsystems, Inc.
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in
the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.
Sun™, Sun Microsystems™, the Sun logo©, Java™, Netra™, Solaris™, Sun StorageTek™, docs.sun.comSM, OpenBoot™, SunVTS™, Sun Fire™,
SunSolveSM, CoolThreads™, and J2EE™, are trademarks or registered trademarks of Sun Microsystems, Inc. or its subsidiaries in the U.S. and
other countries.
Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited.
All SPARC trademarks are used under license and are registered trademarks of SPARC International, Inc. in the U.S. and other countries.
Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems, Inc.
SPARC64 is a trademark of SPARC International, Inc., used under license by Fujitsu Microelectronics, Inc. and Fujitsu Limited.
SSH is a registered trademark of SSH Communications Security in the United States and in certain other jurisdictions.
The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges
the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun
holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
United States Government Rights - Commercial use. U.S. Government users are subject to the standard government user license agreements of
Sun Microsystems, Inc. and Fujitsu Limited and the applicable provisions of the FAR and its supplements.
Disclaimer: The only warranties granted by Fujitsu Limited, Sun Microsystems, Inc. or any affiliate of either of them in connection with this
document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product or
technology is provided.
EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. AND THEIR AFFILIATES
MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND (EXPRESS OR IMPLIED) REGARDING SUCH PRODUCT OR
TECHNOLOGY OR THIS DOCUMENT, WHICH ARE ALL PROVIDED AS IS, AND ALL EXPRESS OR IMPLIED CONDITIONS,
REPRESENTATIONS AND WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH
DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Unless otherwise expressly set forth in such agreement, to the extent allowed by applicable law, in no event shall Fujitsu Limited, Sun
Microsystems, Inc. or any of their affiliates have any liability to any third party under any legal theory for any loss of revenues or profits, loss of
use or data, or business interruptions, or for any indirect, special, incidental or consequential damages, even if advised of the possibility of such
damages.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.

Please
Recycle
Copyright © 2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits réservés.
Entrée et revue tecnical fournies par FUJITSU LIMITED sur des parties de ce matériel.
Sun Microsystems, Inc. et Fujitsu Limited détiennent et contrôlent toutes deux des droits de propriété intellectuelle relatifs aux produits et
technologies décrits dans ce document. De même, ces produits, technologies et ce document sont protégés par des lois sur le copyright, des
brevets, d’autres lois sur la propriété intellectuelle et des traités internationaux. Les droits de propriété intellectuelle de Sun Microsystems, Inc.
et Fujitsu Limited concernant ces produits, ces technologies et ce document comprennent, sans que cette liste soit exhaustive, un ou plusieurs
des brevets déposés aux États-Unis et indiqués à l’adresse http://www.sun.com/patents de même qu’un ou plusieurs brevets ou applications
brevetées supplémentaires aux États-Unis et dans d’autres pays.
Ce document, le produit et les technologies afférents sont exclusivement distribués avec des licences qui en restreignent l’utilisation, la copie, la
distribution et la décompilation. Aucune partie de ce produit, de ces technologies ou de ce document ne peut être reproduite sous quelque
forme que ce soit, par quelque moyen que ce soit, sans l’autorisation écrite préalable de Fujitsu Limited et de Sun Microsystems, Inc., et de leurs
éventuels bailleurs de licence. Ce document, bien qu’il vous ait été fourni, ne vous confère aucun droit et aucune licence, expresses ou tacites,
concernant le produit ou la technologie auxquels il se rapporte. Par ailleurs, il ne contient ni ne représente aucun engagement, de quelque type
que ce soit, de la part de Fujitsu Limited ou de Sun Microsystems, Inc., ou des sociétés affiliées.
Ce document, et le produit et les technologies qu’il décrit, peuvent inclure des droits de propriété intellectuelle de parties tierces protégés par
copyright et/ou cédés sous licence par des fournisseurs à Fujitsu Limited et/ou Sun Microsystems, Inc., y compris des logiciels et des
technologies relatives aux polices de caractères.
Par limites du GPL ou du LGPL, une copie du code source régi par le GPL ou LGPL, comme applicable, est sur demande vers la fin utilsateur
disponible; veuillez contacter Fujitsu Limted ou Sun Microsystems, Inc.
Cette distribution peut comprendre des composants développés par des tierces parties.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun™, Sun Microsystems™, le logo Sun©, Java™, Netra™, Solaris™, Sun StorageTek™, docs.sun.comSM, OpenBoot™, SunVTS™, Sun Fire™,
SunSolveSM, CoolThreads™, et J2EE™ sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. , ou ses filiales aux
Etats-Unis et dans d’autres pays.
Fujitsu et le logo Fujitsu sont des marques déposées de Fujitsu Limited.
Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun
Microsystems, Inc.
SPARC64 est une marques déposée de SPARC International, Inc., utilisée sous le permis par Fujitsu Microelectronics, Inc. et Fujitsu Limited.
SSH est une marque déposée registre de SSH Communications Security aux Etats-Uniset dans certaines autres juridictions.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique
pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence
couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui, en outre, se conforment aux
licences écrites de Sun.
Droits du gouvernement américain - logiciel commercial. Les utilisateurs du gouvernement américain sont soumis aux contrats de licence
standard de Sun Microsystems, Inc. et de Fujitsu Limited ainsi qu’aux clauses applicables stipulées dans le FAR et ses suppléments.
Avis de non-responsabilité: les seules garanties octroyées par Fujitsu Limited, Sun Microsystems, Inc. ou toute société affiliée de l’une ou l’autre
entité en rapport avec ce document ou tout produit ou toute technologie décrit(e) dans les présentes correspondent aux garanties expressément
stipulées dans le contrat de licence régissant le produit ou la technologie fourni(e).
SAUF MENTION CONTRAIRE EXPRESSÉMENT STIPULÉE DANS CE CONTRAT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. ET LES
SOCIÉTÉS AFFILIÉES REJETTENT TOUTE REPRÉSENTATION OU TOUTE GARANTIE, QUELLE QU’EN SOIT LA NATURE (EXPRESSE
OU IMPLICITE) CONCERNANT CE PRODUIT, CETTE TECHNOLOGIE OU CE DOCUMENT, LESQUELS SONT FOURNIS EN L’ÉTAT. EN
OUTRE, TOUTES LES CONDITIONS, REPRÉSENTATIONS ET GARANTIES EXPRESSES OU TACITES, Y COMPRIS NOTAMMENT TOUTE
GARANTIE IMPLICITE RELATIVE À LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UTILISATION PARTICULIÈRE OU À
L’ABSENCE DE CONTREFAÇON, SONT EXCLUES, DANS LA MESURE AUTORISÉE PAR LA LOI APPLICABLE.
Sauf mention contraire expressément stipulée dans ce contrat, dans la mesure autorisée par la loi applicable, en aucun cas Fujitsu Limited, Sun
Microsystems, Inc. ou l’une de leurs filiales ne sauraient être tenues responsables envers une quelconque partie tierce, sous quelque théorie
juridique que ce soit, de tout manque à gagner ou de perte de profit, de problèmes d’utilisation ou de perte de données, ou d’interruptions
d’activités, ou de tout dommage indirect, spécial, secondaire ou consécutif, même si ces entités ont été préalablement informées d’une telle
éventualité.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFACON.
Contents

Preface xiii

Identifying Server Components 1


Infrastructure Boards and Cables 1
Front Panel Diagram 3
Front Panel LEDs 4
Rear Panel Diagram 5
Rear Panel LEDs 7
Ethernet Port LEDs 8

Managing Faults 9
Understanding Fault Handling Options 9
Server Diagnostics Overview 9
Diagnostic Flowchart 11
Options for Accessing the Service Processor 14
ILOM Overview 15
ALOM CMT Compatibility Shell Overview 17
Solaris Predictive Self-Healing Overview 17
SunVTS Overview 18
POST Fault Management Overview 19
POST Fault Management Flowchart 20
Memory Fault Handling Overview 21
Connecting to the Service Processor 22

v
▼ Switch From the System Console to the Service Processor (ILOM or
ALOM CMT Compatibility Shell) 23
▼ Switch From ILOM to the System Console 23
▼ Switch From the ALOM CMT Compatibility Shell to the System
Console 24
Displaying FRU Information with ILOM 24
▼ Display System Components With the ILOM show components
Command 24
▼ Display Individual Component Information With the ILOM show
Command 25
Controlling How POST Runs 26
▼ Change POST Parameters 27
▼ Run POST in Maximum Mode 28
Detecting Faults 30
Detecting Faults Using LEDs 30
Detecting Faults Using ILOM show faulty Command 32
▼ Detect Faults Using the ILOM show faulty Command 33
Detecting Faults Using Solaris OS Files and Commands 35
▼ Check the Message Buffer 36
▼ View System Message Log Files 36
Detecting Faults Using the ILOM Event Log 37
▼ View ILOM Event Log 37
Detecting Faults Using SunVTS Software 38
▼ Verify Installation of SunVTS Software 38
▼ Start the SunVTS Browser Environment 39
SunVTS Software Packages 41
Useful SunVTS Tests 42
Detecting Faults Using POST 42
Identifying Faults Detected by PSH 44

vi SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Detect Faults Identified by the Solaris PSH Facility Using the ILOM
fmdump Command 45
Clearing Faults 47
▼ Clear Faults Detected During POST 48
▼ Clear Faults Detected by PSH 49
▼ Clear Faults Detected in the External I/O Expansion Unit 50
Disabling Faulty Components 50
▼ Disable System Components 52
▼ Re-Enable System Components 52
ILOM-to-ALOM CMT Command Reference 53

Preparing to Service the System 59


Safety Information 59
Safety Symbols 60
Electrostatic Discharge Safety Measures 61
Antistatic Wrist Strap 61
Antistatic Mat 61
Required Tools 62
▼ Obtain the Chassis Serial Number 62
▼ Obtain the Chassis Serial Number Remotely 62
Powering Off the System 63
▼ Power Off From the Command Line 64
▼ Power Off – Graceful Shutdown 64
▼ Power Off – Emergency Shutdown 65
▼ Disconnect Power Cords From the Server 65
Extending the Server to the Maintenance Position 65
▼ Extend the Server to the Maintenance Position 66
Removing the Server From the Rack 67
▼ Remove the Server From the Rack 67

Contents vii
Performing Electrostatic Discharge – Antistatic Prevention Measures 69
▼ Perform Electrostatic Discharge – Antistatic Prevention Measures 69
Removing the Top Cover 69
▼ Remove the Top Cover 69

Servicing Customer-Replaceable Units 71


Hot-Pluggable and Hot-Swappable Devices 72
Servicing Hard Drives 72
▼ Remove a Hard Drive (Hot-Plug) 73
▼ Install a Hard Drive (Hot-Plug) 75
▼ Remove a Hard Drive 77
▼ Install a Hard Drive 78
Hard Drive Device Identifiers 79
Hard Drive LEDs 80
Servicing Fan Trays 81
▼ Remove a Fan Tray (Hot-Swap) 81
▼ Install a Fan Tray (Hot-Swap) 82
▼ Remove a Fan Tray 83
▼ Install a Fan Tray 84
Fan Tray Device Identifiers 84
Fan Tray Fault LED 84
Servicing Power Supplies 85
▼ Remove a Power Supply (Hot-Swap) 86
▼ Install a Power Supply (Hot-Swap) 87
▼ Remove a Power Supply 89
▼ Install a Power Supply 90
Power Supply Device Identifiers 91
Power Supply LED 91
Servicing PCIe Cards 92

viii SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Remove a PCIe Card 93
▼ Install a PCIe Card 94
▼ Add a PCIe Card 94
PCIe Device Identifiers 96
PCIe Slot Configuration Guidelines 97
Servicing CMP/Memory Modules 98
▼ Remove a CMP/Memory Module 99
▼ Install a CMP/Memory Module 100
▼ Add a CMP/Memory Module 101
CMP and Memory Module Device Identifiers 103
Supported CMP/Memory Module Configurations 104
Servicing FB-DIMMs 104
▼ Remove FB-DIMMs 105
▼ Install FB-DIMMs 105
▼ Verify FB-DIMM Replacement 106
▼ Add FB-DIMMs 109
Supported FB-DIMM Configurations 110
FB-DIMM Device Identifiers 112
FB-DIMM Fault Button Locations 113

Servicing Field-Replaceable Units 115


Servicing the Front Bezel 115
▼ Remove the Front Bezel 116
▼ Install the Front Bezel 117
Servicing the DVD-ROM Drive 118
▼ Remove the DVD-ROM Drive 118
▼ Install the DVD-ROM Drive 119
Servicing the Service Processor 120
▼ Remove the Service Processor 120

Contents ix
▼ Install the Service Processor 122
Servicing the IDPROM 123
▼ Remove the IDPROM 123
▼ Install the IDPROM 124
Servicing the Battery 125
▼ Remove the Battery 125
▼ Install the Battery 125
Servicing the Power Distribution Board 126
▼ Remove the Power Distribution Board 126
▼ Install the Power Distribution Board 128
Servicing the Fan Tray Carriage 129
▼ Remove the Fan Tray Carriage 129
▼ Install the Fan Tray Carriage 131
Servicing the Hard Drive Backplane 132
▼ Remove the Hard Drive Backplane 132
▼ Install the Hard Drive Backplane 133
Servicing the Motherboard 135
▼ Remove the Motherboard 135
▼ Install the Motherboard 138
Motherboard Fastener Locations 139
Servicing the Flex Cable Assembly 140
▼ Remove the Flex Cable Assembly 141
▼ Install the Flex Cable Assembly 142
Servicing the Front Control Panel 144
▼ Remove the Front Control Panel 144
▼ Install the Front Control Panel 145
Servicing the Front I/O Board 146
▼ Remove the Front I/O Board 147

x SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Install the Front I/O Board 148

Returning the Server to Operation 149


▼ Install the Top Cover 150
▼ Install the Server Into the Rack 150
▼ Slide the Server Into the Rack 151
▼ Connect the Power Cords to the Server 153
▼ Power On the Server 153

Performing Node Reconfiguration 155


I/O Connections to CMP/Memory Modules 156
Recovering from a Failed CMP/Memory Module 157
Reconfiguring I/O Device Nodes 158
▼ Reconfigure the I/O and PCIe Fabric 158
▼ Temporarily Disable All Memory Modules 160
▼ Re-Enable All Memory Modules 161
▼ Reset the LDoms Guest Configuration 162
System Bus Topology 162
I/O Fabric in 2P Configuration 164
I/O Fabric in 4P Configuration 165

Connector Pinouts 167


Serial Management Port Connector Pinouts 167
Network Management Port Connector Pinouts 168
Serial Port Connector Pinouts 169
USB Connector Pinouts 169
Gigabit Ethernet Connector Pinouts 170

Server Components 173


Customer-Replaceable Units 174

Contents xi
Field-Replaceable Units 176

Index 179

xii SPARC Enterprise T5440 Server Service Manual • July 2009


Preface

This manual provides detailed procedures that describe the removal and replacement
of replaceable parts in the SPARC Enterprise™ T5440 Server. This manual also
includes information about the use and maintenance of the server. This document is
written for technicians, system administrators, authorized service providers (ASPs),
and users who have advanced experience troubleshooting and replacing hardware.

For Safe Operation


This manual contains important information regarding the use and handling of this
product. Read this manual thoroughly. Pay special attention to the section “Notes on
Safety” on page xix. Use the product according to the instructions and information
available in this manual. Keep this manual handy for further reference.

Keep this manual handy for further reference. Fujitsu makes every effort to prevent
users and bystanders from being injured or from suffering damage to their property.
Use the product according to this manual.

Before You Read This Document


To fully use the information in this document, you must have thorough knowledge of
the topics discussed in the SPARC Enterprise T5440 Server Product Notes.

xiii
Structure and Contents of This Manual
This manual is organized as described below:
■ “Identifying Server Components” on page 1
Provides an overview of the server, including major boards and components, as
well as front and rear panel features.
■ “Managing Faults” on page 9
Describes the diagnostics that are available for monitoring and troubleshooting
the server.
■ “Preparing to Service the System” on page 59
Describes the steps necessary to prepare the server for service.
■ “Servicing Customer-Replaceable Units” on page 71
Describes how to service customer-replaceable units (CRUs)
■ “Servicing Field-Replaceable Units” on page 115
Describes how to service field-replaceable units (FRUs)
■ “Returning the Server to Operation” on page 149
Describes how to bring the server back to operation after performing service
procedures.
■ “Performing Node Reconfiguration” on page 155
Describes how to perform node reconfiguration.
■ “Connector Pinouts” on page 167
Contains pinout tables for all external connectors.
■ “Server Components” on page 173
Contains illustrations showing server components.

xiv SPARC Enterprise T5440 Server Service Manual • July 2009


Related Documentation
The latest versions of all the SPARC Enterprise Series manuals are available at the
following Web sites:

Global Site

(http://www.fujitsu.com/sparcenterprise/manual/)

Japanese Site

(http://primeserver.fujitsu.com/sparcenterprise/manual/)

Title Description Manual Code

SPARC Enterprise T5440 Server Minimum steps to power on and boot the C120-E504
Getting Started Guide server for the first time
SPARC Enterprise T5440 Server Information about the latest product C120-E508
Product Notes updates and issues
Important Safety Information for Safety information that is common to all C120-E391
Hardware Systems SPARC Enterprise series servers
SPARC Enterprise T5440 Server Safety and compliance information that is C120-E509
Safety and Compliance Guide specific to the server
SPARC Enterprise/ Requirements and concepts of installation C120-H007
PRIMEQUEST Common and facility planning for the setup of
Installation Planning Manual SPARC Enterprise and PRIMEQUEST
SPARC Enterprise T5440 Server Server specifications for site planning C120-H029
Site Planning Guide
SPARC Enterprise T5440 Server Detailed rackmounting, cabling, power on, C120-E510
Installation and Setup Guide and configuring information
SPARC Enterprise T5440 Server How to run diagnostics to troubleshoot the C120-E512
Service Manual server, and how to remove and replace
parts in the server
SPARC Enterprise T5440 Server How to perform administrative tasks that C120-E511
Administration Guide are specific to the server
Integrated Lights Out Manager Information that is common to all C120-E474
2.0 User’s Guide platforms managed by Integrated Lights
Out Manager (ILOM) 2.0
Integrated Lights Out Manager How to use the ILOM 2.0 software on the C120-E513
2.0 Supplement for SPARC server
Enterprise T5440 Server

Preface xv
Title Description Manual Code

Integrated Lights Out Manager Information that describes ILOM 3.0 C120-E573
3.0 Concepts Guide features and functionality
Integrated Lights Out Manager Information and procedures for network C120-E576
3.0 Getting Started Guide connection, logging in to ILOM 3.0 for the
first time, and configuring a user account
or a directory service
Integrated Lights Out Manager Information and procedures for accessing C120-E574
3.0 Web Interface Procedures ILOM 3.0 functions using the ILOM web
Guide interface
Integrated Lights Out Manager Information and procedures for accessing C120-E575
3.0 CLI Procedures Guide ILOM 3.0 functions using the ILOM CLI
Integrated Lights Out Manager Information and procedures for accessing C120-E579
3.0 SNMP and IPMI Procedure ILOM 3.0 functions using SNMP or IPMI
Guide management hosts
Integrated Lights Out Manager Enhancements that have been made to C120-E600
3.x Feature Updates and Release ILOM firmware since the ILOM 3.0 release
Notes
Integrated Lights Out Manager How to use the ILOM 3.0 software on the C120-E587
3.0 Supplement for SPARC server
Enterprise T5440 Server
External I/O Expansion Unit Procedures for installing the External I/O C120-E543
Installation and Service Manual Expansion Unit on the SPARC Enterprise
T5120/T5140/T5220/T5240/T5440 servers
External I/O Expansion Unit Important and late-breaking information C120-E544
Product Notes about the External I/O Expansion Unit

Note – Product Notes are available on the website only. Please check for the recent
update on your product.

xvi SPARC Enterprise T5440 Server Service Manual • July 2009


UNIX Commands
This document might not contain information on basic UNIX® commands and
procedures such as shutting down the system, booting the system, and configuring
devices. Refer to the following for this information:
■ Software documentation that you received with your system
■ Solaris™ Operating System documentation, which is at
(http://docs.sun.com)

Text Conventions
Typeface* Meaning Examples

AaBbCc123 The names of commands, files, Edit your .login file.


and directories; on-screen Use ls -a to list all files.
computer output % You have mail.
AaBbCc123 What you type, when % su
contrasted with on-screen Password:
computer output
AaBbCc123 Book titles, new words or terms, Read Chapter 6 in the User’s Guide.
words to be emphasized. These are called class options.
Replace command-line To delete a file, type rm filename.
variables with real names or
values.

* The settings on your browser might differ from these settings.

Preface xvii
Prompt Notations
The following prompt notations are used in this manual.

Shell Prompt Notations

C shell machine-name%
C shell superuser machine-name#
Bourne shell and Korn shell $
Bourne shell and Korn shell superuser #
ILOM service processor ->
ALOM compatibility shell sc>
OpenBoot PROM firmware ok

Conventions for Alert Messages


This manual uses the following conventions to show alert messages, which are
intended to prevent injury to the user or bystanders as well as property damage, and
important messages that are useful to the user.

Warning – This indicates a hazardous situation that could result in death or serious
personal injury (potential hazard) if the user does not perform the procedure
correctly.

Caution – This indicates a hazardous situation that could result in minor or


moderate personal injury if the user does not perform the procedure correctly. This
signal also indicates that damage to the product or other property may occur if the
user does not perform the procedure correctly.

Caution – This indicates that surfaces are hot and might cause personal injury if
touched. Avoid contact.

xviii SPARC Enterprise T5440 Server Service Manual • July 2009


Caution – This indicates that hazardous voltages are present. To reduce the risk of
electric shock and danger to personal health, follow the instructions.

Tip – This indicates information that could help the user to use the product more
effectively.

Alert Messages in the Text


An alert message in the text consists of a signal indicating an alert level followed by
an alert statement. A space of one line precedes and follows an alert statement.

Caution – The following tasks regarding this product and the optional products
provided from Fujitsu should only be performed by a certified service engineer.
Users must not perform these tasks. Incorrect operation of these tasks may cause
malfunction.

Also, important alert messages are shown in “Important Alert Messages” on


page xix.

Notes on Safety

Important Alert Messages


This manual provides the following important alert signals:

Caution – This indicates a hazardous situation could result in minor or moderate


personal injury if the user does not perform the procedure correctly. This signal also
indicates that damage to the product or other property may occur if the user does not
perform the procedure correctly.

Preface xix
Task Warning

Maintenance Damage
Two people must dismount and carry the chassis.

The weight of the server on extended slide rails can be enough to


overturn an equipment rack. Before you begin, deploy the antitilt feature
on your cabinet.

The server weighs approximately 88 lb (40 kg). Two people are required
to lift and mount the server into a rack enclosure when using the
procedures in this chapter.

Caution – This indicates that hazardous voltages are present. To reduce the risk of
electric shock and danger to personal health, follow the instructions.

Task Warning

Maintenance Electric shock


Never attempt to run the server with the covers removed. Hazardous
voltage present.

Because 3.3v standby power is always present in the system, you must
unplug the power cords before accessing any cold-serviceable
components.

Caution – This indicates that surfaces are hot and might cause personal injury if
touched. Avoid contact.

Task Warning

Maintenance Extremely hot


FB-DIMMs may be hot. Use caution when servicing FB-DIMMs.

xx SPARC Enterprise T5440 Server Service Manual • July 2009


Product Handling

Maintenance

Warning – Certain tasks in this manual should only be performed by a certified


service engineer. User must not perform these tasks. Incorrect operation of these
tasks may cause electric shock, injury, or fire.

■ Installation and reinstallation of all components, and initial settings


■ Removal of front, rear, or side covers
■ Mounting/de-mounting of optional internal devices
■ Plugging or unplugging of external interface cards
■ Maintenance and inspections (repairing, and regular diagnosis and maintenance)

Caution – The following tasks regarding this product and the optional products
provided from Fujitsu should only be performed by a certified service engineer.
Users must not perform these tasks. Incorrect operation of these tasks may cause
malfunction.

■ Unpacking optional adapters and such packages delivered to the users


■ Plugging or unplugging of external interface cards

Remodeling/Rebuilding

Caution – Do not make mechanical or electrical modifications to the equipment.


Using this product after modifying or reproducing by overhaul may cause
unexpected injury or damage to the property of the user or bystanders.

Preface xxi
Alert Label
The following is a label attached to this product:
■ Never peel off the label.
■ The following label provides information to the users of this product.

xxii SPARC Enterprise T5440 Server Service Manual • July 2009


Fujitsu Welcomes Your Comments
If you have any comments or requests regarding this document, or if you find any
unclear statements in the document, please state your points specifically on the form
at the following URL.

For Users in U.S.A., Canada, and Mexico:

(https://download.computers.us.fujitsu.com/)

For Users in Other Countries:

(http://www.fujitsu.com/global/contact/computing/sparce_index.ht
ml)

Preface xxiii
xxiv SPARC Enterprise T5440 Server Service Manual • July 2009
Identifying Server Components

This section provides an overview of the server, including major boards and
components, as well as front and rear panel features.

Description Links

Overview of the infrastructure boards and “Infrastructure Boards and Cables” on


cables in the server page 1
Overview of front panel features “Front Panel Diagram” on page 3
“Front Panel LEDs” on page 4
Overview of rear panel features “Rear Panel Diagram” on page 5
“Rear Panel LEDs” on page 7
“Ethernet Port LEDs” on page 8

Related Information
■ “Server Components” on page 173

Infrastructure Boards and Cables


The SPARC Enterprise T5440 server is based on a 4U chassis.

The SPARC Enterprise T5440 server has the following boards installed in the chassis:
■ Motherboard – The motherboard includes slots for up to four CMP modules and
four memory modules, memory control subsystem, up to eight PCIe expansion
slots, and a service processor slot. The motherboard also contains a top cover
safety interlock (“kill”) switch.

Note – 10-Gbit Ethernet XAUI cards are shared in Slots 4 and 5.

■ CMP module – Each CMP module contains an UltraSPARC T2 Plus chip, slots for
four FB-DIMMs, and associated DC-DC converters.

1
■ Memory module – A memory module containing slots for an additional 12
FB-DIMMs is associated with each CMP module.
■ Service processor – The service processor (ILOM) board controls the server
power and monitors server power and environmental events. The service
processor draws power from the server’s 3.3V standby supply rail, which is
available whenever the system is receiving main input power, even when the
system is turned off.
A removable IDPROM contains MAC addresses, host ID, and ILOM and
OpenBoot PROM configuration data. When replacing the service processor, the
IDPROM can be transferred to a new board to retain system configuration data.
■ Power supply backplane – This board distributes main 12V power from the
power supplies to the rest of the system. The power supply backplane is
connected to the motherboard and the disk drive backplane via a flex cable. High
voltage power is provided to the motherboard via a bus bar assembly.
■ Hard drive backplane – This board includes the connectors for up to four hard
drives. It is connected to the motherboard via a flex cable assembly.
Each drive has its own Power/Activity, Fault, and Ready-to-Remove LEDs.
■ Front control panel – This board connects directly to the motherboard, and serves
as the interconnect for the front I/O board. It contains the front panel LEDs and
the Power button.
■ Front I/O board – This board connects to the front control panel interconnect. It
contains two USB ports.
■ Flex cable assembly – The flex cable assembly serves as the interconnect between
the power supply backplane, motherboard, hard drive backplane, and DVD-ROM
drive.
■ Power supply backplane I2C cable – This cable transmits power supply status to
the motherboard.

Related Information
■ SPARC Enterprise T5440 Server Site Planning Guide.
■ “Managing Faults” on page 9
■ “Servicing Customer-Replaceable Units” on page 71
■ “Servicing Field-Replaceable Units” on page 115

2 SPARC Enterprise T5440 Server Service Manual • July 2009


Front Panel Diagram
The server front panel contains a recessed system power button, system status and
fault LEDs, Locator button and LED. The front panel also provides access to internal
hard drives, the DVD-ROM drive (if equipped), and the two front USB ports.

FIGURE: Front Panel Features on page 3 shows front panel features on the SPARC
Enterprise T5440 server. For a detailed description of front panel controls and LEDs,
see “Front Panel LEDs” on page 4.

FIGURE: Front Panel Features

Figure Legend

1 Locator Button/LED 5 Component Fault LEDs


2 Service Required LED 6 DVD-ROM Drive
3 Power/OK LED 7 USB Ports
4 Power Button 8 Hard Drives

Related Information
■ “Front Panel LEDs” on page 4

Identifying Server Components 3


■ “Rear Panel Diagram” on page 5
■ “Servicing the Front Bezel” on page 115

Front Panel LEDs


See TABLE: Rear Panel System LEDs on page 7 for a description of the front panel
system LEDs and controls.

TABLE: Front Panel LEDs and Controls

LED or Button Icon Description

Locator LED The Locator LED enables you to find a particular system. The LED is activated
and button using one of the following methods:
(white) • The ALOM CMT command setlocator on.
• The ILOM command set /SYS/LOCATE value=Fast_Blink
• Manually press the Locator button to toggle the Locator LED on or off.
This LED provides the following indications:
• Off – Normal operating state.
• Fast blink – System received a signal as a result of one of the methods
previously mentioned, indicating that it is active.
Service If on, indicates that service is required. POST and ILOM are two diagnostics
Required LED tools that can detect a fault or failure resulting in this indication.
(amber) The ILOM show faulty command provides details about any faults that cause
this indicator to light.
Under some fault conditions, individual component fault LEDs are lit in
addition to the system Service Required LED.
Power OK Provides the following indications:
LED • Off – Indicates that the system is not running in its normal state. System
(green) power might be off. The service processor might be running.
• Steady on – Indicates that the system is powered on and is running in its
normal operating state. No service actions are required.
• Fast blink – Indicates the system is running at a minimum level in standby
and is ready to be quickly returned to full function. The service processor is
running.
• Slow blink – Indicates that a normal transitory activity is taking place. Slow
blinking could indicate that the system diagnostics are running, or that the
system is booting.

4 SPARC Enterprise T5440 Server Service Manual • July 2009


TABLE: Front Panel LEDs and Controls (Continued)

LED or Button Icon Description

Power button The recessed Power button toggles the system on or off.
• If the system is powered off, press once to power on.
• If the system is powered on, press once to initiate a graceful system shutdown.
• If the system is powered on, press and hold for 4 seconds to initiate an
emergency shutdown.
For more information about powering on and powering off the system, see the
SPARC Enterprise T5440 Server Administration Guide.
Fan Fault LED TOP Provides the following operational fan indications:
(amber) FAN • Off – Indicates a steady state, no service action is required.
• Steady on – Indicates that a fan failure event has been acknowledged and a
service action is required on at least one of the fan modules.
Power Supply REAR Provides the following operational PSU indications:
Fault LED PS • Off – Indicates a steady state, no service action is required.
(amber) • Steady on – Indicates that a power supply failure event has been
acknowledged and a service action is required on at least one PSU.
Overtemp LED Provides the following operational temperature indications:
(amber) • Off – Indicates a steady state, no service action is required.
• Steady on – Indicates that a temperature failure event has been acknowledged
and a service action is required.

Related Information
■ “Front Panel Diagram” on page 3
■ “Rear Panel LEDs” on page 7
■ “Detecting Faults Using LEDs” on page 30

Rear Panel Diagram


The rear panel provides access to system I/O ports, PCIe ports, Gigabit Ethernet
ports, power supplies, Locator button and LED, and system status LEDs.

FIGURE: Rear Panel Features on page 6 shows rear panel features on the SPARC
Enterprise T5440 server. For more detailed information about ports and their uses,
see the SPARC Enterprise T5440 Server Installation and Setup Guide. For a detailed
description of PCIe slots, see “PCIe Device Identifiers” on page 96.

Identifying Server Components 5


FIGURE: Rear Panel Features

Figure Legend

1 Power supplies
2 Serial port
3 Serial management port
4 System status LEDs
5 USB ports
6 Network management port
7 Gigabit Ethernet ports

Related Information
■ “Front Panel Diagram” on page 3
■ “Rear Panel LEDs” on page 7
■ “Ethernet Port LEDs” on page 8
■ “Detecting Faults Using LEDs” on page 30

6 SPARC Enterprise T5440 Server Service Manual • July 2009


Rear Panel LEDs
TABLE: Rear Panel System LEDs on page 7 describes the rear panel system LEDs.

TABLE: Rear Panel System LEDs

LED Icon Description

Locator LED The Locator LED enables you to find a particular system. The LED is
and button activated using one of the following methods:
(white) • The ALOM CMT command setlocator on.
• The ILOM command set /SYS/LOCATE value=Fast_Blink
• Manually press the Locator button to toggle the Locator LED on or off.
This LED provides the following indications:
• Off – Normal operating state.
• Fast blink – System received a signal as a result of one of the methods
previously mentioned, indicating that it is active.
Service If on, indicates that service is required. POST and ILOM are two diagnostics
Required LED tools that can detect a fault or failure resulting in this indication.
(amber) The ILOM show faulty command provides details about any faults that
cause this indicator to light.
Under some fault conditions, individual component fault LEDs are lit in
addition to the system Service Required LED.
Power OK LED Provides the following indications:
(green) • Off – Indicates that the system is not running in its normal state. System
power might be off. The service processor might be running.
• Steady on – Indicates that the system is powered on and is running in its
normal operating state. No service actions are required.
• Fast blink – Indicates the system is running at a minimum level in standby
and is ready to be quickly returned to full function. The service processor is
running.
• Slow blink – Indicates that a normal transitory activity is taking place. Slow
blinking could indicate the system diagnostics are running, or that the
system is booting.

Related Information
■ “Rear Panel Diagram” on page 5
■ “Ethernet Port LEDs” on page 8
■ “Detecting Faults Using LEDs” on page 30

Identifying Server Components 7


Ethernet Port LEDs
The service processor network management port and the four 10/100/1000 Mbps
Ethernet ports each have two LEDs, as described in TABLE: Ethernet Port LEDs on
page 8.

TABLE: Ethernet Port LEDs

LED Color Description

Left LED Amber Speed indicator:


or • Amber on – The link is operating as a Gigabit connection
green (1000-Mbps).*
• Green on – The link is operating as a 100-Mbps connection.
• Off – The link is operating as a 10-Mbps connection.
Right LED Green Link/Activity indicator:
• Steady on – A link is established.
• Blinking – There is activity on this port.
• Off – No link is established.
* The NET MGT port only operates in 100-Mbps or 10-Mbps so the speed indicator LED will be green or off (never
amber).

Related Information
■ “Rear Panel Diagram” on page 5
■ “Rear Panel LEDs” on page 7
■ “Detecting Faults Using LEDs” on page 30

8 SPARC Enterprise T5440 Server Service Manual • July 2009


Managing Faults

These topics describe the diagnostics tools that are available for monitoring and
troubleshooting the server.

This chapter is intended for technicians, service personnel, and system


administrators who service and repair computer systems.

Topic Links

Background: fault detection methodology “Understanding Fault Handling Options”


on page 9
Configuring and using the service processor “Connecting to the Service Processor” on
page 22
Displaying system configuration “Displaying FRU Information with ILOM”
information with the service processor on page 24
Configuring POST for diagnostic purposes “Controlling How POST Runs” on page 26
Detecting system faults “Detecting Faults” on page 30
Clearing system faults “Clearing Faults” on page 47
Disabling faulty components to allow the “Disabling Faulty Components” on page 50
system to run in a degraded state
ILOM commands and equivalent ALOM “ILOM-to-ALOM CMT Command
CMT commands Reference” on page 53

Understanding Fault Handling Options

Server Diagnostics Overview


You can use a variety of diagnostic tools, commands, and indicators to monitor and
troubleshoot a server:

9
■ LEDs – Provide a quick visual notification of the status of the server and of some
of the FRUs.
■ ILOM firmware – This system firmware runs on the service processor. In addition
to providing the interface between the hardware and OS, ILOM also tracks and
reports the health of key server components. ILOM works closely with POST and
Solaris Operating System (Solaris OS) Predictive Self-Healing technology to keep
the system up and running even when there is a faulty component.
■ Power-on self-test (POST) – POST performs diagnostics on system components
upon system reset to ensure the integrity of those components. POST is
configurable and works with ILOM to take faulty components offline if needed.
■ Solaris OS Predictive Self-Healing (PSH) – This technology continuously
monitors the health of the processor and memory, and works with ILOM to take a
faulty component offline if needed. The Predictive Self-Healing technology
enables systems to accurately predict component failures and mitigate many
serious problems before they occur.
■ Log files and console messages – Solaris OS log files and ILOM system event log
can be accessed and displayed on the device of your choice.
■ SunVTS software – The SunVTS software exercises the system, provides
hardware validation, and discloses possible faulty components with
recommendations for repair.

The LEDs, ILOM, Solaris OS PSH, and many of the log files and console messages
are integrated. For example, a fault detected by the Solaris software displays the
fault, logs it, and passes information to ILOM where it is logged. Depending on the
fault, one or more LEDs might be illuminated.

The diagnostic flowchart in TABLE: Diagnostic Flowchart Actions on page 12 and


TABLE: ILOM Parameters Used for POST Configuration on page 26 describes an
approach for using the server diagnostics to identify a faulty field-replaceable unit
(FRU). The diagnostics you use, and the order in which you use them, depend on the
nature of the problem you are troubleshooting. So you might perform some actions
and not others.

Before referring to the flowchart, perform some basic troubleshooting tasks:


■ Verify that the server was installed properly.
■ Visually inspect cables and power.
■ (Optional) Perform a reset of the server.

Related Information
■ “Diagnostic Flowchart” on page 11
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide

10 SPARC Enterprise T5440 Server Service Manual • July 2009


Diagnostic Flowchart
FIGURE: Diagnostic Flowchart on page 11 is a flowchart of the diagnostics available
to troubleshoot faulty hardware. TABLE: ILOM Parameters Used for POST
Configuration on page 26 has more information about each diagnostic in this chapter.

FIGURE: Diagnostic Flowchart

Managing Faults 11
TABLE: Diagnostic Flowchart Actions

Action
No. Diagnostic Action Resulting Action For more information

1. Check Power OK The Power OK LED is located on the front and rear “Detecting Faults” on
and AC Present of the chassis. page 30
LEDs on the server. The AC Present LED is located on the rear of the
server on each power supply.
If these LEDs are not on, check the power source
and power connections to the server.
2. Run the ILOM The show faulty command displays the following “Detect Faults Using the
show faulty kinds of faults: ILOM show faulty
command to check • Environmental faults Command” on page 33
for faults. • External I/O Expansion Unit faults
• Solaris Predictive Self-Healing (PSH) detected
faults
• POST-detected faults
Faulty FRUs are identified in fault messages using
the FRU name.
Note - If the ILOM show faulty output includes
an error string such as Ext sensor or Ext FRU, it
indicates a fault in the External I/O Expansion Unit.
3. Check the Solaris The Solaris log files and the ILOM system event log “Detecting Faults Using
log files and ILOM record system events, and provide information Solaris OS Files and
system event log about faults. Commands” on page 35
for fault • Browse the ILOM system event log for major or
information. critical events. Some problems are logged in the
event log but not added to the show faulty list
• If system messages indicate a faulty device,
replace the FRU.
• To obtain more diagnostic information, go to
Action No. 4
4. Run SunVTS SunVTS is an application you can run to exercise “Detecting Faults Using
software. and diagnose FRUs. To run SunVTS, the server must SunVTS Software” on
be running the Solaris OS. page 38
• If SunVTS reports a faulty device, replace the
FRU.
• If SunVTS does not report a faulty device, go to
Action No. 5.

12 SPARC Enterprise T5440 Server Service Manual • July 2009


TABLE: Diagnostic Flowchart Actions (Continued)

Action
No. Diagnostic Action Resulting Action For more information

5. Run POST. POST performs basic tests of the server components “Detecting Faults Using
and reports faulty FRUs. POST” on page 42

TABLE: System Faults and


Fault LED States on
page 31, TABLE: ALOM
CMT Parameters and POST
Modes on page 56
6. Determine if the Determine if the fault is an environmental fault or a “Detecting Faults Using
fault is an configuration fault. ILOM show faulty
environmental or If the fault listed by the show faulty command Command” on page 32
configuration displays a temperature or voltage fault, then the
fault. fault is an environmental fault. Environmental faults “Detecting Faults” on
can be caused by faulty FRUs (power supply or fan), page 30
or by environmental conditions such as when
computer room ambient temperature is too high, or
the server airflow is blocked. When the
environmental condition is corrected, the fault will
automatically clear.
If the fault indicates that a fan or power supply is
bad, you can perform a hot-swap of the FRU. You
can also use the fault LEDs on the server to identify
the faulty FRU (fans and power supplies).
If the FRU displayed by the show faulty
command is /SYS, the fault is a configuration
problem. /SYS indicates no faulty FRU has been
diagnosed, but there is a problem with the system
configuration.
7. Determine if the Problems detected in the External I/O Expansion “Detecting Faults Using
fault was detected Unit include the text string Ext FRU or Ext ILOM show faulty
in the External I/O Sensor at the beginning of the fault description. Command” on page 32
Expansion Unit.
“Clear Faults Detected in
the External I/O Expansion
Unit” on page 50

Managing Faults 13
TABLE: Diagnostic Flowchart Actions (Continued)

Action
No. Diagnostic Action Resulting Action For more information

8. Determine if the If the fault displayed included a uuid and “Identifying Faults
fault was detected sunw-msg-id property, the fault was detected by the Detected by PSH” on
by PSH. Solaris Predictive Self-Healing software. page 44
If the fault is a PSH-detected fault, refer to the PSH
Knowledge Article web site for additional “Clear Faults Detected by
information. The Knowledge Article for the fault is PSH” on page 49
located at the following link:
(http://www.sun.com/msg/)message-ID
where message-ID is the value of the sunw-msg-id
property displayed by the show faulty command.
After the FRU is replaced, perform the procedure to
clear PSH-detected faults.
9. Determine if the POST performs basic tests of the server components “POST Fault Management
fault was detected and reports faulty FRUs. When POST detects a Overview” on page 19
by POST. faulty FRU, it logs the fault and if possible, takes the
FRU offline. POST detected FRUs display the “Clear Faults Detected
following text in the fault message: During POST” on page 48
Forced fail reason
In a POST fault message, reason is the name of the
power-on routine that detected the failure.
10. Contact technical The majority of hardware faults are detected by the “Obtain the Chassis Serial
support. server’s diagnostics. In rare cases a problem might Number” on page 62
require additional troubleshooting. If you are unable
to determine the cause of the problem, contact your
service representative for support.

Related Information
■ “Server Diagnostics Overview” on page 9
■ SPARC Enterprise T5440 Server Administration Guide

Options for Accessing the Service Processor


There are three methods of interacting with the service processor:
■ Integrated Lights Out Manager (ILOM) shell (default) – Available via the System
Management Port and the Network Management Port.
■ ILOM browser interface (BI) – Documented in the Integrated Lights Out Manager 3.0
Web Interface Procedures Guide.
■ ALOM CMT compatibility shell – Legacy shell emulation of ALOM CMT.

14 SPARC Enterprise T5440 Server Service Manual • July 2009


The code examples in this document depict use of the ILOM shell.

Note – Multiple service processor accounts can be active concurrently. A user can be
logged in under one account using the ILOM shell, and another account using the
ALOM CMT shell.

Related Information
■ “Diagnostic Flowchart” on page 11
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

ILOM Overview
The Integrated Lights Out Manager (ILOM) firmware runs on the service processor
in the server, enabling you to remotely manage and administer your server.

ILOM enables you to remotely run diagnostics such as power-on self-test (POST),
that would otherwise require physical proximity to the server’s serial port. You can
also configure ILOM to send email alerts of hardware failures, hardware warnings,
and other events related to the server or to ILOM.

The service processor runs independently of the server, using the server’s standby
power. Therefore, ILOM firmware and software continue to function when the server
OS goes offline or when the server is powered off.

Note – Refer to the Integrated Lights Out Manager 3.0 Concepts Guide for
comprehensive ILOM information.

Faults detected by ILOM, POST, the Solaris Predictive Self-Healing (PSH) technology,
and the External I/O Expansion Unit (if attached) are forwarded to ILOM for fault
handling (FIGURE: ILOM Fault Management on page 16).

In the event of a system fault, ILOM ensures that the Service Required LED is lit,
FRUID PROMs are updated, the fault is logged, and alerts are displayed. Faulty
FRUs are identified in fault messages using the FRU name.

Managing Faults 15
FIGURE: ILOM Fault Management

FRU fault LEDs


Environmentals
ILOM System fault LED
POST fault manager
User alerts
Solaris PSH
showfaults

The service processor can detect when a fault is no longer present and clears the fault
in several ways:
■ Fault recovery – The system automatically detects that the fault condition is no
longer present. The service processor extinguishes the Service Required LED and
updates the FRU’s PROM, indicating that the fault is no longer present.
■ Fault repair – The fault has been repaired by human intervention. In most cases,
the service processor detects the repair and extinguishes the Service Required
LED. If the service processor does not perform these actions, you must perform
these tasks manually by setting the ILOM component_state or fault_state of the
faulted component.

The service processor can detect the removal of a FRU, in many cases even if the FRU
is removed while the service processor is powered off (for example, if the system
power cables are unplugged during service procedures). This function enables ILOM
to know that a fault, diagnosed to a specific FRU, has been repaired.

Note – ILOM does not automatically detect hard drive replacement.

Many environmental faults can automatically recover. A temperature that is


exceeding a threshold might return to normal limits. An unplugged power supply
can be plugged in, and so on. Recovery of environmental faults is automatically
detected.

Note – No ILOM command is needed to manually repair an environmental fault.

The Solaris Predictive Self-Healing technology does not monitor the hard drive for
faults. As a result, the service processor does not recognize hard drive faults, and will
not light the fault LEDs on either the chassis or the hard drive itself. Use the Solaris
message files to view hard drive faults.

Related Information
■ “Diagnostic Flowchart” on page 11

16 SPARC Enterprise T5440 Server Service Manual • July 2009


■ “Detecting Faults Using LEDs” on page 30
■ “Detecting Faults Using Solaris OS Files and Commands” on page 35
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

ALOM CMT Compatibility Shell Overview


The default shell for the service processor is the ILOM shell. However, you can use
the ALOM CMT compatibility shell to emulate the ALOM CMT interface supported
on the previous generation of CMT servers. Using the ALOM CMT compatibility
shell (with a few exceptions) you can use commands that resemble the commands of
ALOM CMT.

The service processor sends alerts to all ALOM CMT users that are logged in, sends
the alert through email to a configured email address, and writes the event to the
ILOM event log. The ILOM event log is also available using the ALOM CMT
compatibility shell.

See the Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server
for comparisons between the ILOM CLI and the ALOM CMT compatibility CLI, and
for instructions for adding an ALOM-CMT account.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Detecting Faults Using LEDs” on page 30
■ “ILOM-to-ALOM CMT Command Reference” on page 53
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

Solaris Predictive Self-Healing Overview


The Solaris Predictive Self-Healing (PSH) technology enables the server to diagnose
problems while the Solaris OS is running, and mitigate many problems before they
negatively affect operations.

Managing Faults 17
The Solaris OS uses the Fault Manager daemon, fmd (1M), which starts at boot time
and runs in the background to monitor the system. If a component generates an
error, the daemon handles the error by correlating the error with data from previous
errors and other related information to diagnose the problem. Once diagnosed, the
Fault Manager daemon assigns the problem a Universal Unique Identifier (UUID)
that distinguishes the problem across any set of systems. When possible, the Fault
Manager daemon initiates steps to self-heal the failed component and take the
component offline. The daemon also logs the fault to the syslogd daemon and
provides a fault notification with a message ID (MSGID). You can use the message ID
to get additional information about the problem from the knowledge article database.

The Predictive Self-Healing technology covers the following server components:


■ UltraSPARC T2 Plus multicore processor
■ Memory
■ I/O subsystem

The PSH console message provides the following information about each detected
fault:
■ Type
■ Severity
■ Description
■ Automated response
■ Impact
■ Suggested action for system administrator

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Identifying Faults Detected by PSH” on page 44
■ SPARC Enterprise T5440 Server Administration Guide

SunVTS Overview
Sometimes a server exhibits a problem that cannot be isolated definitively to a
particular hardware or software component. In such cases, it might be useful to run a
diagnostic tool that stresses the system by continuously running a comprehensive
battery of tests. SunVTS software is provided for this purpose.

Related Information
■ “Diagnostic Flowchart” on page 11

18 SPARC Enterprise T5440 Server Service Manual • July 2009


■ “SunVTS Software Packages” on page 41
■ “Useful SunVTS Tests” on page 42
■ SPARC Enterprise T5440 Server Administration Guide

POST Fault Management Overview


Power-on self-test (POST) is a group of PROM-based tests that run when the server is
powered on or reset. POST checks the basic integrity of the critical hardware
components in the server (CMP, memory, and I/O subsystem).

POST tests critical hardware components to verify functionality before the system
boots and accesses software. If POST detects a faulty component, the component is
disabled automatically, preventing faulty hardware from potentially harming any
software. If the system is capable of running without the disabled component, the
system will boot when POST is complete. For example, if one of the processor cores
is deemed faulty by POST, the core will be disabled. The system will boot and run
using the remaining cores.

You can use POST as an initial diagnostic tool for the system hardware. In this case,
configure POST to run in maximum mode (diag_mode=service, setkeyswitch=
diag, diag_level=max) for thorough test coverage and verbose output.

Managing Faults 19
POST Fault Management Flowchart
FIGURE: Flowchart of Variables for POST Configuration

Related Information
■ “Diagnostic Flowchart” on page 11

20 SPARC Enterprise T5440 Server Service Manual • July 2009


■ “Detecting Faults Using POST” on page 42
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide

Memory Fault Handling Overview


A variety of features play a role in how the memory subsystem is configured and
how memory faults are handled. Understanding the underlying features helps you
identify and repair memory problems. This section describes how the server deals
with memory faults.

Note – For memory configuration information, see “Supported FB-DIMM


Configurations” on page 110.

The server uses advanced ECC technology that corrects up to 4-bits in error on nibble
boundaries, as long as the bits are all in the same DRAM. On 4 GB FB-DIMMs, if a
DRAM fails, the DIMM continues to function.

The following server features independently manage memory faults:


■ POST – Based on ILOM configuration variables, POST runs when the server is
powered on.
For correctable memory errors (CEs), POST forwards the error to the Solaris
Predictive Self-Healing (PSH) daemon for error handling. If an uncorrectable
memory fault is detected, POST displays the fault with the device name of the
faulty FB-DIMMs, and logs the fault. POST then disables the faulty FB-DIMMs.
Depending on the memory configuration and the location of the faulty FB-DIMM,
POST disables half of physical memory in the system, or half the physical memory
and half the processor threads. When this offlining process occurs in normal
operation, you must replace the faulty FB-DIMMs based on the fault message and
enable the disabled FB-DIMMs with the ILOM command set device
component_state=enabled where device is the name of the FB-DIMM being
enabled (for example, set /SYS/MB/CPU0/CMP0/BR0/CH0/D0
component_state=enabled).
■ Solaris Predictive Self-Healing (PSH) technology – A feature of the Solaris OS,
PSH uses the Fault Manager daemon (fmd) to watch for various kinds of faults.
When a fault occurs, the fault is assigned a unique fault ID (UUID), and logged.
PSH reports the fault and identifies the locations of the faulty FB-DIMMs.

If you suspect that the server has a memory problem, follow the flowchart (see
FIGURE: Diagnostic Flowchart on page 11). Run the ILOM show faulty command.
The show faulty command lists memory faults and lists the specific FB-DIMMs
that are associated with the fault.

Managing Faults 21
Note – You can use the FB-DIMM DIAG buttons on the CMP module and memory
module to identify faulty FB-DIMMs. See “FB-DIMM Fault Button Locations” on
page 113.

Once you identify which FB-DIMMs you want to replace, see “Servicing FB-DIMMs”
on page 104 for FB-DIMM removal and replacement instructions. You must perform
the instructions in that section to clear the faults and enable the replaced FB-DIMMs.

Related Information
■ “Controlling How POST Runs” on page 26
■ “Displaying FRU Information with ILOM” on page 24
■ “Detecting Faults” on page 30
■ “Servicing FB-DIMMs” on page 104

Connecting to the Service Processor


Before you can run ILOM commands, you must connect to the service processor.
There are several ways to connect to the service processor.

Topic Links

Connect an ASCII terminal directly to the SPARC Enterprise T5440 Server Installation
serial management port. and Setup Guide
Use the ssh command to connect to service SPARC Enterprise T5440 Server Installation
processor through an Ethernet connection and Setup Guide
on the network management port.
Switch from the system console to the “Switch From the System Console to the
service processor Service Processor (ILOM or ALOM CMT
Compatibility Shell)” on page 23
Switch from the service processor to the “Switch From ILOM to the System Console”
system console on page 23
“Switch From the ALOM CMT
Compatibility Shell to the System Console”
on page 24

Related Information
■ “Diagnostic Flowchart” on page 11

22 SPARC Enterprise T5440 Server Service Manual • July 2009


■ “Switch From the System Console to the Service Processor (ILOM or ALOM CMT
Compatibility Shell)” on page 23
■ “Switch From ILOM to the System Console” on page 23
■ “Switch From the ALOM CMT Compatibility Shell to the System Console” on
page 24
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide

▼ Switch From the System Console to the Service


Processor (ILOM or ALOM CMT Compatibility
Shell)
● To switch from the system console to the service processor prompt, type #.
(Hash-Period).

# #.
->

▼ Switch From ILOM to the System Console


● From the ILOM -> prompt, type start /SP/console.

-> start /SP/console


#

▼ Switch From the ALOM CMT Compatibility


Shell to the System Console
● From the ALOM-CMT sc> prompt, type console.

sc> console
#

Managing Faults 23
Displaying FRU Information with ILOM

▼ Display System Components With the ILOM


show components Command
The show components command displays the system components (asrkeys) and
reports their status.

● At the -> prompt, type the show components command.


CODE EXAMPLE: Output of the show components Command With No
Disabled Components on page 24 shows partial output with no disabled
components.
CODE EXAMPLE: Output of the show components Command Showing
Disabled Components on page 25 shows the showcomponents command output
with a component disabled.

CODE EXAMPLE: Output of the show components Command With No Disabled Components
-> show components
Target | Property | Value
--------------------+------------------------+-------------------------------
/SYS/MB/PCIE0 | component_state | Enabled
/SYS/MB/PCIE3/ | component_state | Enabled
/SYS/MB/PCIE1/ | component_state | Enabled
/SYS/MB/PCIE4/ | component_state | Enabled
/SYS/MB/PCIE2/ | component_state | Enabled
/SYS/MB/PCIE5/ | component_state | Enabled
/SYS/MB/NET0 | component_state | Enabled
/SYS/MB/NET1 | component_state | Enabled
/SYS/MB/NET2 | component_state | Enabled
/SYS/MB/NET3 | component_state | Enabled
/SYS/MB/PCIE | component_state | Enabled

CODE EXAMPLE: Output of the show components Command Showing Disabled Components
-> show components
Target | Property | Value
--------------------+------------------------+-------------------------------
/SYS/MB/PCIE0/ | component_state | Enabled
/SYS/MB/PCIE3/ | component_state | Disabled
/SYS/MB/PCIE1/ | component_state | Enabled
/SYS/MB/PCIE4/ | component_state | Enabled

24 SPARC Enterprise T5440 Server Service Manual • July 2009


CODE EXAMPLE: Output of the show components Command Showing Disabled Components (Continued)
/SYS/MB/PCIE2/ | component_state | Enabled
/SYS/MB/PCIE5/ | component_state | Enabled
/SYS/MB/NET0 | component_state | Enabled
/SYS/MB/NET1 | component_state | Enabled
/SYS/MB/NET2 | component_state | Enabled
/SYS/MB/NET3 | component_state | Enabled
/SYS/MB/PCIE | component_state | Enabled

▼ Display Individual Component Information With


the ILOM show Command
Use the show command to display information about individual components in the
server.

● At the -> prompt, enter the show command.


In CODE EXAMPLE: show Command Output on page 25, the show command is
used to get information about a memory module (FB-DIMM).

CODE EXAMPLE: show Command Output


-> show /SYS/MB/CPU0/CMP0/BR1/CH0/D0

/SYS/MB/CPU0/CMP0/BR1/CH0/D0
Targets:
R0
R1
SEEPROM
SERVICE
PRSNT
T_AMB

Properties:
type = DIMM
component_state = Enabled
fru_name = 1024MB DDR2 SDRAM FB-DIMM 333 (PC2 5300)
fru_description = FBDIMM 1024 Mbyte
fru_manufacturer = Micron Technology
fru_version = FFFFFF
fru_part_number = 18HF12872FD667D6D4
fru_serial_number = d81813ce
fault_state = OK
clear_fault_action = (none)

Managing Faults 25
CODE EXAMPLE: show Command Output (Continued)
Commands:
cd
show

Controlling How POST Runs


The server can be configured for normal, extensive, or no POST execution. You can
also control the level of tests that run, the amount of POST output that is displayed,
and which reset events trigger POST by using ILOM command variables.

The keyswitch_state parameter, when set to diag, overrides all the other ILOM
POST variables.

TABLE: ILOM Parameters Used for POST Configuration on page 26 lists the ILOM
variables used to configure POST. FIGURE: Flowchart of Variables for POST
Configuration on page 20 shows how the variables work together.

TABLE: ILOM Parameters Used for POST Configuration

Parameter Values Description

keyswitch_mode normal The system can power on and run POST (based
on the other parameter settings). For details see
FIGURE: Flowchart of Variables for POST
Configuration on page 20. This parameter
overrides all other commands.
diag The system runs POST based on predetermined
settings.
stby The system cannot power on.
locked The system can power on and run POST, but no
flash updates can be made.
diag_mode off POST does not run.
normal Runs POST according to diag_level value.
service Runs POST with preset values for diag_level
and diag_verbosity.
diag_level max If diag_mode = normal, runs all the minimum
tests plus extensive processor and memory tests.
min If diag_mode = normal, runs minimum set of
tests.

26 SPARC Enterprise T5440 Server Service Manual • July 2009


TABLE: ILOM Parameters Used for POST Configuration (Continued)

Parameter Values Description

diag_trigger none Does not run POST on reset.


user_reset Runs POST upon user initiated resets.
power_on_reset Only runs POST for the first power on. This
option is the default.
error_reset Runs POST if fatal errors are detected.
all_resets Runs POST after any reset.
diag_verbosity none No POST output is displayed.
min POST output displays functional tests with a
banner and pinwheel.
normal POST output displays all test and informational
messages.
max POST displays all test, informational, and some
debugging messages.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Change POST Parameters” on page 28
■ “Run POST in Maximum Mode” on page 28
■ “Detecting Faults Using POST” on page 42
■ “Clear Faults Detected During POST” on page 48
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide

▼ Change POST Parameters


1. Access the ILOM prompt.
See “Connecting to the Service Processor” on page 22.

Managing Faults 27
2. Use the ILOM commands to change the POST parameters.
Refer to TABLE: System Faults and Fault LED States on page 31 for a list of ILOM
POST parameters and their values.
The set /SYS keyswitch_state command sets the virtual keyswitch
parameter. For example:

-> set /SYS keyswitch_state=Diag


Set ‘keyswitch_state’ to ‘Diag’

To change individual POST parameters, you must first set the keyswitch_state
parameter to normal. For example:

-> set /SYS keyswitch_state=Normal


Set ‘ketswitch_state’ to ‘Normal’
-> set /HOST/diag property=Min

▼ Run POST in Maximum Mode


This procedure describes how to run POST when you want maximum testing, as in
the case when you are troubleshooting a server, or verifying a hardware upgrade or
repair.

1. Access the ILOM prompt.


See “Connecting to the Service Processor” on page 22.

2. Set the virtual keyswitch to diag so that POST will run in service mode.

-> set /SYS/keyswitch_state=Diag


Set ‘keyswitch_state’ to ‘Diag’

3. Reset the system so that POST runs.


There are several ways to initiate a reset. CODE EXAMPLE: Initiating POST With
a Power Cycle on page 29 shows a reset using a power cycle command sequence.
For other methods, refer to the SPARC Enterprise T5440 Server Administration Guide.

Note – The server takes about one minute to power off. Use the show /HOST
command to determine when the host has been powered off. The console will display
status=Powered Off

28 SPARC Enterprise T5440 Server Service Manual • July 2009


4. Switch to the system console to view the POST output:

-> start /SP/console

If no faults were detected, the system will boot.


CODE EXAMPLE: POST Output (Abridged) on page 29 depicts abridged POST
output.

CODE EXAMPLE: Initiating POST With a Power Cycle


-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

CODE EXAMPLE: POST Output (Abridged)


-> start /SP/console

...
2007-12-19 22:01:17.810 0:0:0>INFO: STATUS: Running RGMII 1G
BCM5466R PHY level Loopback Test
2007-12-19 22:01:22.534 0:0:0>End : Neptune 1G Loopback Test -
Port 2
2007-12-19 22:01:22.553 0:0:0>
2007-12-19 22:01:22.542 0:0:0>Begin: Neptune 1G Loopback Test -
Port 3
2007-12-19 22:01:22.556 0:0:0>INFO: STATUS: Running BMAC level
Loopback Test
2007-12-19 22:01:32.004 0:0:0>End : Neptune 1G Loopback Test -
Port 3
2007-12-19 22:01:27.271 0:0:0>
T5440, No Keyboard
Enter #. to return to ALOM.
2007-12-19 22:01:32.012 0:0:0>INFO:
2007-12-19 22:01:27.274 0:0:0>INFO: STATUS: Running RGMII 1G
BCM5466R PHY level Loopback Test
OpenBoot ..., 7968 MB memory available, Serial #75916434.
2007-12-19 22:01:32.019 0:0:0>POST Passed all devices.
[stacie obp #0]
2007-12-19 22:01:32.028 0:0:0>POST:Return to VBSC.
Ethernet address 0:14:4f:86:64:92, Host ID: xxxxx
2007-12-19 22:01:32.036 0:0:0>Master set ACK for vbsc runpost
command and spin...
{0} ok

Managing Faults 29
Detecting Faults
This section describes the different methods you can use to identify system faults in
the SPARC Enterprise T5440.

Task Topic

Use front panel and back panel LEDs to “Detecting Faults Using LEDs” on page 30
identify system faults.
Use the ILOM show faulty command to “Detecting Faults Using ILOM show faulty
detect faults. Command” on page 32
Use Solaris OS files and commands to “Detecting Faults Using Solaris OS Files and
detect faults. Commands” on page 35
Use the ILOM event log to detect faults. “Detecting Faults Using the ILOM Event Log”
on page 37
Use POST to identify faults. “Detecting Faults Using POST” on page 42
Use Solaris Predictive Self-Healing (PSH) “Identifying Faults Detected by PSH” on
to identify faults. page 44

Detecting Faults Using LEDs


The server provides the following groups of LEDs:
■ Front panel system LEDs. See “Front Panel LEDs” on page 4.
■ Rear panel system LEDs. See “Rear Panel LEDs” on page 7.
■ Hard drive LEDs. See “Hard Drive LEDs” on page 80.
■ Power supply LEDs. See “Power Supply LED” on page 91.
■ Fan tray LEDs. See “Fan Tray Fault LED” on page 84.
■ Rear panel Ethernet port LEDs. See “Ethernet Port LEDs” on page 8.
■ CMP module or memory module LEDs. See “Servicing CMP/Memory Modules”
on page 98
■ FB-DIMM Fault LEDs. See “FB-DIMM Fault Button Locations” on page 113.

These LEDs provide a quick visual check of the state of the system.

30 SPARC Enterprise T5440 Server Service Manual • July 2009


TABLE: ASR Commands on page 51 describes which fault LEDs are lit under given
error conditions. Use the ILOM show faulty command to obtain more information
about the nature of a given fault. See “Detect Faults Using the ILOM show faulty
Command” on page 33.

TABLE: System Faults and Fault LED States

Component Fault Fault LEDs Lit Additional Information

Power supply • Service Required LED (front and rear • “Front Panel LEDs” on page 4
panel) • “Rear Panel LEDs” on page 7
• Front panel Power Supply Fault LED • “Power Supply LED” on page 91
• Individual power supply Fault LED • “Servicing Power Supplies” on page 85
Fan tray • Service Required LED (front and rear • “Front Panel LEDs” on page 4
panel) • “Rear Panel LEDs” on page 7
• Front panel Fan Fault LED • “Fan Tray Fault LED” on page 84
• Individual fan tray Fault LED • “Servicing Fan Trays” on page 81
• Overtemp LED (if overtemp condition
exists)
Hard drive • Service Required LED (front and rear See these sections:
panel) • “Front Panel LEDs” on page 4
• Individual hard drive Fault LED • “Rear Panel LEDs” on page 7
• “Hard Drive LEDs” on page 80
• “Servicing Hard Drives” on page 72

Managing Faults 31
TABLE: System Faults and Fault LED States (Continued)

Component Fault Fault LEDs Lit Additional Information

CMP module • Service Required LED (front and rear A lit CMP module or memory module fault LED
or memory panel) might indicate a problem with an FB-DIMM
module • CMP Module Fault LED or Memory installed on the CMP module, or a problem with
Module Fault LED the CMP module itself.
See these sections:
• “Front Panel LEDs” on page 4
• “Rear Panel LEDs” on page 7
• “Servicing CMP/Memory Modules” on page 98
• “Servicing FB-DIMMs” on page 104
FB-DIMM • Service Required LED (front and rear See these sections:
panel) • “Front Panel LEDs” on page 4
• CMP Module Fault LED or Memory • “Rear Panel LEDs” on page 7
Module Fault LED • “Servicing FB-DIMMs” on page 104
• FB-DIMM Fault LED (CMP and • “FB-DIMM Fault Button Locations” on page 113
memory modules) (when FB-DIMM
Locate button is pressed)
Other • Service Required LED (front and rear Not all components have an individual component
components panel) Fault LED. If the Service Required LED is lit, use
the show faulty command to obtain additional
information about the component affected. See
these sections:
• “Front Panel LEDs” on page 4
• “Rear Panel LEDs” on page 7

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Detecting Faults Using LEDs” on page 30
■ “ILOM-to-ALOM CMT Command Reference” on page 53
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

Detecting Faults Using ILOM show faulty


Command
Use the ILOM show faulty command to display the following kinds of faults:

32 SPARC Enterprise T5440 Server Service Manual • July 2009


■ Environmental or configuration faults – System configuration faults. Or
temperature or voltage problems that might be caused by faulty FRUs (power
supplies, fans, or blower), or by room temperature or blocked air flow to the
server.
■ POST-detected faults – Faults on devices detected by the POST diagnostics.
■ PSH-detected faults – Faults detected by the Solaris Predictive Self-Healing (PSH)
technology.
■ External I/O Expansion Unit faults – Faults detected in the optional External I/O
Expansion Unit.

Use the show faulty command for the following reasons:


■ To see if any faults have been diagnosed in the system.
■ To verify that the replacement of a FRU has cleared the fault and not generated
any additional faults.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Detecting Faults Using LEDs” on page 30
■ “ILOM-to-ALOM CMT Command Reference” on page 53
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

▼ Detect Faults Using the ILOM show faulty


Command
● At the -> prompt, type the show faulty command.
The following show faulty command examples show the different kinds of
output from the show faulty command:
■ Example of the show faulty command when no faults are present:

-> show faulty


Target | Property | Value
--------------------+------------------------+-------------------------------

-----------------------------------------------------------------------------

Managing Faults 33
■ Example of the show faulty command displaying an environmental fault:

-> show faulty


Target | Property | Value
--------------------+------------------------+-------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/FT1
/SP/faultmgmt/0 | timestamp | Dec 14 23:01:32
/SP/faultmgmt/0/ | timestamp | Dec 14 23:01:32 faults/0
/SP/faultmgmt/0/ | sp_detected_fault | TACH at /SYS/MB/FT1 has
faults/0 | | exceeded low non-recoverable
| | threshold.

■ Example of the show faulty command displaying a configuration fault:

-> show faulty


Target | Property | Value
------------------+----------------------+-----------------------------------
/SP/faultmgmt/0 | fru | /SYS
/SP/faultmgmt/0 | timestamp | Mar 17 08:17:45
/SP/faultmgmt/0/ | timestamp | Mar 17 08:17:45
faults/0 | |
/SP/faultmgmt/0/ | sp_detected_fault | At least 2 power supplies must
faults/0 | | have AC power

Note – Environmental and configuration faults automatically clear when the


environmental condition returns to the normal range of when the configuration fault
is addressed.

■ Example showing a fault that was detected by the PSH technology. These kinds
of faults are distinguished from other kinds of faults by the presence of a
sunw-msg-id and by a UUID.

-> show faulty


Target | Property | Value
--------------------+------------------------+--------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/MEM0/CMP0/BR1/CH1/D1
/SP/faultmgmt/0 | timestamp | Dec 14 22:43:59
/SP/faultmgmt/0/ | sunw-msg-id | SUN4V-8000-DX
faults/0 | |
/SP/faultmgmt/0/ | uuid | 3aa7c854-9667-e176-efe5-e487e520
faults/0 | | 7a8a
/SP/faultmgmt/0/ | timestamp | Dec 14 22:43:59
faults/0 | |

34 SPARC Enterprise T5440 Server Service Manual • July 2009


■ Example showing a fault that was detected by POST. These kinds of faults are
identified by the message Forced fail reason where reason is the name of the
power-on routine that detected the failure.

-> show faulty


Target | Property | Value
--------------------+------------------------+--------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/CPU0/CMP0/BR1/CH0/D0
/SP/faultmgmt/0 | timestamp | Dec 21 16:40:56
/SP/faultmgmt/0/ | timestamp | Dec 21 16:40:56
faults/0 | |
/SP/faultmgmt/0/ | sp_detected_fault | /SYS/MB/CPU0/CMP0/CMP0/BR1/CH0/D0
faults/0 | Forced fail(POST)

■ Example showing a fault in the External I/O Expansion Unit. These faults can
be identified by the text string Ext FRU or Ext sensor at the beginning of the
fault description.
The text string Ext FRU indicates that the specified FRU is faulty and should
be replaced. The text string Ext sensor indicates that the specified FRU
contains the sensor that detected the problem. In this case, the specified FRU
may not be faulty. Contact service support to isolate the problem.

-> show faulty


Target | Property | Value
--------------------+------------------------+--------------------------------
/SP/faultmgmt/0 | fru | /SYS/IOX@X0TC/IOB1/LINK
/SP/faultmgmt/0 | timestamp | Feb 05 18:28:20
/SP/faultmgmt/0/ | timestamp | Feb 05 18:28:20
faults/0 | |
/SP/faultmgmt/0/ | sp_detected_fault | Ext FRU /SYS/IOX@X0TC/IOB1/LINK
faults/0 | | SIGCON=0 I2C no device response

Detecting Faults Using Solaris OS Files and


Commands
With the Solaris OS running on the server, you have the full complement of Solaris
OS files and commands available for collecting information and for troubleshooting.

If POST, ILOM, or the Solaris PSH features do not indicate the source of a fault, check
the message buffer and log files for notifications for faults. Hard drive faults are
usually captured by the Solaris message files.

Use the dmesg command to view the most recent system message. To view the
system messages log file, view the contents of the /var/adm/messages file.

Managing Faults 35
Related Information
■ “Diagnostic Flowchart” on page 11
■ “Detecting Faults Using LEDs” on page 30
■ “ILOM-to-ALOM CMT Command Reference” on page 53
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

▼ Check the Message Buffer


1. Log in as superuser.

2. Issue the dmesg command:

# dmesg

The dmesg command displays the most recent messages generated by the system.

▼ View System Message Log Files


The error logging daemon, syslogd, automatically records various system
warnings, errors, and faults in message files. These messages can alert you to system
problems such as a device that is about to fail.

The /var/adm directory contains several message files. The most recent messages
are in the /var/adm/messages file. After a period of time (usually every week), a
new messages file is automatically created. The original contents of the messages
file are rotated to a file named messages.1. Over a period of time, the messages are
further rotated to messages.2 and messages.3, and then deleted.

1. Log in as superuser.

2. Type the following command:

# more /var/adm/messages

3. If you want to view all logged messages, type the following command:

# more /var/adm/messages*

36 SPARC Enterprise T5440 Server Service Manual • July 2009


Detecting Faults Using the ILOM Event Log
Certain problems are recorded in the ILOM event log but not posted to the list of
faults displayed by the ILOM show faulty command. Inspect the ILOM event log
if you suspect a problem, but no entry appears in the ILOM show faulty command
output.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “View ILOM Event Log” on page 37
■ “ILOM-to-ALOM CMT Command Reference” on page 53
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

▼ View ILOM Event Log


● Type the following command:

-> show /SP/logs/event/list

Note – The ILOM event log can also be viewed through the ILOM BUI or the ALOM
CMT CLI.

If a “major” or “critical” event is found that was not expected and not included
under ILOM show faulty than it may indicate a system fault. The following is an
example of unexpected major events in the log.

-> show /sp/logs/event/list


1626 Fri Feb 15 18:57:29 2008 Chassis Log major
Feb 15 18:57:29 ERROR: [CMP0 ] Only 4 cores, up to 32 cpus are
configured because some L2_BANKS are unusable

1625 Fri Feb 15 18:57:28 2008 Chassis Log major


Feb 15 18:57:28 ERROR: System DRAM Available: 004096 MB

1624 Fri Feb 15 18:57:28 2008 Chassis Log major


Feb 15 18:57:28 ERROR: [CMP1 ] memc_1_1 unused because associated
L2 banks on CMP0 cannot be used

Managing Faults 37
1623 Fri Feb 15 18:57:27 2008 Chassis Log major
Feb 15 18:57:27 ERROR: Degraded configuration: system operating at
reduced capacity

1622 Fri Feb 15 18:57:27 2008 Chassis Log major


Feb 15 18:57:27 ERROR: [CMP0] /MB/CPU0/CMP0/BR1 neither channel
populated with DIMM0 Branch 1 not configured

Detecting Faults Using SunVTS Software


The SunVTS software features a Java-based browser environment, an ASCII-based
screen interface, and a command-line interface. For more information about how to
use the SunVTS software, see the SunVTS 7.0 User’s Guide.

The Solaris OS must be running in order to use the SunVTS software. You also must
ensure that the SunVTS validation test software is installed on your system.

This section describes the tasks necessary to use SunVTS software to exercise your
server.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Verify Installation of SunVTS Software” on page 38
■ “Start the SunVTS Browser Environment” on page 39
■ “SunVTS Software Packages” on page 41
■ “Useful SunVTS Tests” on page 42
■ SPARC Enterprise T5440 Server Administration Guide
■ SunVTS 7.0 User’s Guide

▼ Verify Installation of SunVTS Software


To perform this procedure, the Solaris OS must be running on the server, and you
must have access to the Solaris command line.

Note – The SunVTS 7.0 software, and future compatible versions, are supported on
the server.

38 SPARC Enterprise T5440 Server Service Manual • July 2009


The SunVTS installation process requires that you specify one of two security
schemes to use when running SunVTS. The security scheme you choose must be
properly configured in the Solaris OS for you to run the SunVTS software. For
details, refer to the SunVTS User’s Guide.

1. Check for the presence of SunVTS packages using the pkginfo command.

% pkginfo -l SUNWvts SUNWvtsmn SUNWvtsr SUNWvtss SUNWvtsts

■ If the SunVTS software is installed, information about the packages is


displayed.
■ If the SunVTS software is not installed, you see an error message for each
missing package, as in CODE EXAMPLE: Missing Package Errors for SunVTS
Software on page 39.
See TABLE: SunVTS Software Packages on page 41 for a list of required SunVTS
software packages.

2. If the SunVTS software is not installed, you can obtain the installation
packages from the following places:
■ Solaris Operating System DVDs
■ Download from the web. Refer to the Preface for information on how to access
the web site.

CODE EXAMPLE: Missing Package Errors for SunVTS Software


ERROR: information for "SUNWvts" was not found
ERROR: information for "SUNWvtsr" was not found
...

▼ Start the SunVTS Browser Environment


For information about test options and prerequisites, refer to the SunVTS 7.0 User’s
Guide.

Note – SunVTS software can be run in several modes. You must perform this
procedure using the default mode.

1. Start the SunVTS agent and Javabridge on the server.

# cd /usr/sunvts/bin
# ./startsunvts

2. At the interface prompt, choose C to start the SunVTS client.

Managing Faults 39
3. Start the SunVTS browser environment from a web browser on the client
system. Type https://server-name:6789.
The SunVTS browser environment is displayed (FIGURE: SunVTS Browser
Environment (Test Group Screen) on page 40).

FIGURE: SunVTS Browser Environment (Test Group Screen)

4. (Optional) Select the test categories you want to run.


Certain test categories are enabled by default. You can choose to accept these.

Note – TABLE: Useful SunVTS Tests on page 42 lists test categories that are
especially useful to run on this server.

5. (Optional) Customize individual tests.


Click on the name of the test to select and customize individual tests.

Tip – Use the System Excerciser – High Stress Mode to test system operations. Use
the Component Stress – High setting for the highest stress possible.

40 SPARC Enterprise T5440 Server Service Manual • July 2009


6. Start testing.
Click the Start Tests button. Status and error messages appear in the test messages
area located across the bottom of the window. You can stop testing at any time by
clicking the Stop button.
During testing, the SunVTS software logs all status and error messages. To view
these messages, click the Logs tab. You can choose to view the following logs:
■ Test Error – Detailed error messages from individual tests.
■ SunVTS Test Kernel (Vtsk) Error – Error messages pertaining to the SunVTS
software itself. Look here if the SunVTS software appears to be acting strangely,
especially when it starts up.
■ Information – Detailed versions of all the status and error messages that
appear in the test messages area.
■ Solaris OS Messages (/var/adm/messages) – A file containing messages
generated by the operating system and various applications.
■ Test Messages (/var/sunvts/logs/sunvts.info) – A directory containing
the SunVTS log files.

SunVTS Software Packages


TABLE: SunVTS Software Packages on page 41 lists SunVTS packages.

TABLE: SunVTS Software Packages

Package Description

SUNWvts Test development library APIs and SunVTS kernel. You must install
this package to run the SunVTS software.
SUNWvtsmn Man pages for the SunVTS utilities, including the command-line
utility.
SUNWvtsr SunVTS framework (root)
SUNWvtss SunVTS browser user interface (BUI) components required on the
server system.
SUNWvtsts SunVTS test binaries

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Useful SunVTS Tests” on page 42
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide

Managing Faults 41
Useful SunVTS Tests
TABLE: Useful SunVTS Tests on page 42 describes the SunVTS tests which are useful
for diagnosing issues with the SPARC Enterprise T5440 server.

TABLE: Useful SunVTS Tests

SunVTS Tests FRUs Exercised by Tests

Memory Test FB-DIMMs


Processor Test CMP, motherboard
Disk Test Disks, cables, disk backplane, DVD drive
Network Test Network interface, network cable, CMP,
motherboard
Interconnect Test Board ASICs and interconnects
IO Ports Test I/O (serial port interface), USB subsystem
Environmental Test Motherboard and service processor

Related Information
■ “Diagnostic Flowchart” on page 11
■ “SunVTS Software Packages” on page 41
■ SPARC Enterprise T5440 Server Installation and Setup Guide
■ SPARC Enterprise T5440 Server Administration Guide

Detecting Faults Using POST


Run POST in maximum mode to detect system faults. See “Run POST in Maximum
Mode” on page 28.

POST error messages use the following syntax:

c:s > ERROR: TEST = failing-test


c:s > H/W under test = FRU
c:s > Repair Instructions: Replace items in order listed by H/W under
test above
c:s > MSG = test-error-message
c:s > END_ERROR

In this syntax, c = the core number, s = the strand number.

Warning and informational messages use the following syntax:

42 SPARC Enterprise T5440 Server Service Manual • July 2009


INFO or WARNING: message

In CODE EXAMPLE: POST Error Message on page 43, POST reports a memory error
at FB-DIMM location /SYS/MB/CPU0/CMP0/BR1/CH0/D0. The error was detected
by POST running on core 7, strand 2.

CODE EXAMPLE: POST Error Message


7:2>
7:2>ERROR: TEST = Data Bitwalk
7:2>H/W under test = /SYS/MB/CPU0/CMP0/BR1/CH0/D0
7:2>Repair Instructions: Replace items in order listed by 'H/W
under test' above.
7:2>MSG = Pin 149 failed on /SYS/MB/CPU0/CMP0/BR1/CH0/D0 (J792)
7:2>END_ERROR

7:2>Decode of Dram Error Log Reg Channel 2 bits


60000000.0000108c
7:2> 1 MEC 62 R/W1C Multiple corrected
errors, one or more CE not logged
7:2> 1 DAC 61 R/W1C Set to 1 if the error
was a DRAM access CE
7:2> 108c SYND 15:0 RW ECC syndrome.
7:2>
7:2> Dram Error AFAR channel 2 = 00000000.00000000
7:2> L2 AFAR channel 2 = 00000000.00000000

Perform further investigation if needed.


■ If POST detects a faulty device, the fault is displayed and the fault information is
passed to the service processor for fault handling. Faulty FRUs are identified in
fault messages using the FRU name.
■ The fault is captured by the service processor, where the fault is logged, the
Service Required LED is lit, and the faulty component is disabled. See
CODE EXAMPLE: Fault Detected by POST on page 48.
■ Run the ILOM show faulty command to obtain additional fault information.

In this example, /SYS/MB/CPU0/CMP0/BR1/CH0/D0 is disabled. The system can


boot using memory that was not disabled until the faulty component is replaced.

Note – You can use ASR commands to display and control disabled components. See
“Disabling Faulty Components” on page 50.

Related Information
■ “Diagnostic Flowchart” on page 11

Managing Faults 43
■ “POST Fault Management Overview” on page 19
■ “POST Fault Management Flowchart” on page 20
■ SPARC Enterprise T5440 Server Administration Guide

Identifying Faults Detected by PSH


When a PSH fault is detected, a Solaris console message similar to CODE EXAMPLE:
Console Message Showing Fault Detected by PSH on page 44 is displayed.

CODE EXAMPLE: Console Message Showing Fault Detected by PSH


SUNW-MSG-ID: SUN4V-8000-DX, TYPE: Fault, VER: 1, SEVERITY: Minor
EVENT-TIME: Wed Sep 14 10:09:46 EDT 2005
PLATFORM: SUNW,system_name, CSN: -, HOSTNAME: wgs48-37
SOURCE: cpumem-diagnosis, REV: 1.5
EVENT-ID: f92e9fbe-735e-c218-cf87-9e1720a28004
DESC: The number of errors associated with this memory module has exceeded
acceptable levels. Refer to http://sun.com/msg/SUN4V-8000-DX for more
information.
AUTO-RESPONSE: Pages of memory associated with this memory module are being
removed from service as errors are reported.
IMPACT: Total system memory capacity will be reduced as pages are retired.
REC-ACTION: Schedule a repair procedure to replace the affected memory module.
Use fmdump -v -u <EVENT_ID> to identify the module.

Faults detected by the Solaris PSH facility are also reported through service processor
alerts.

Note – You can configure ILOM to generate SNMP traps or e-mail alerts when a
fault is detected by Solaris PSH. You can also configure the ALOM CMT
compatibility shell to display Solaris PSH alerts. See the Integrated Lights Out Manager
3.0 Concepts Guide.

CODE EXAMPLE: ALOM CMT Alert of PSH Diagnosed Fault on page 44 depicts an
ALOM CMT alert of the same fault reported by Solaris PSH in CODE EXAMPLE:
Console Message Showing Fault Detected by PSH on page 44.

CODE EXAMPLE: ALOM CMT Alert of PSH Diagnosed Fault


SC Alert: Host detected fault, MSGID: SUN4V-8000-DX

The ILOM show faulty command provides summary information about the fault.
See “Detect Faults Using the ILOM show faulty Command” on page 33 for more
information about the show faulty command.

44 SPARC Enterprise T5440 Server Service Manual • July 2009


Note – The Service Required LED is also turned on for PSH diagnosed faults.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Solaris Predictive Self-Healing Overview” on page 17
■ “ILOM-to-ALOM CMT Command Reference” on page 53
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

▼ Detect Faults Identified by the Solaris PSH


Facility Using the ILOM fmdump Command
The ILOM fmdump command displays the list of faults detected by the Solaris PSH
facility and identifies the faulty FRU for a particular EVENT_ID (UUID).

Note – Do not use fmdump to verify that a FRU replacement has cleared a fault,
because the output of fmdump is the same after the FRU has been replaced. Use the
fmadm faulty command to verify that the fault has cleared. See “Clear Faults
Detected by PSH” on page 49.

1. Check the event log using the fmdump command with -v for verbose output.
In CODE EXAMPLE: Output from the fmdump -v Command on page 46, a fault
is displayed, indicating the following details.
■ Date and time of the fault (Jul 31 12:47:42.2007)
■ Universal Unique Identifier (UUID). The UUID is unique for every fault
(fd940ac2-d21e-c94a-f258-f8a9bb69d05b)
■ Message identifier, which can be used to obtain additional fault information
(SUN4V-8000-JA)
■ Faulted FRU. The information provided in the example includes the part
number of the FRU (part=541215101) and the serial number of the FRU
(serial=101083). The Location field provides the name of the FRU. In
CODE EXAMPLE: Output from the fmdump -v Command on page 46 the FRU
name is MB, meaning the motherboard.

Managing Faults 45
Note – fmdump displays the PSH event log. Entries remain in the log after the fault
has been repaired.

2. Use the message ID to obtain more information about this type of fault.

a. In a browser, go to the Predictive Self-Healing Knowledge Article web site:


http://www.sun.com/msg

b. Obtain the message ID from the console output or the ILOM show faulty
command.

c. Enter the message ID in the SUNW-MSG-ID field, and click Lookup.


In CODE EXAMPLE: PSH Message Output on page 46, the message ID
SUN4V-8000-JA provides information for corrective action:

3. Follow the suggested actions to repair the fault.

CODE EXAMPLE: Output from the fmdump -v Command


# fmdump -v -u fd940ac2-d21e-c94a-f258-f8a9bb69d05b
TIME UUID SUNW-MSG-ID
Jul 31 12:47:42.2007 fd940ac2-d21e-c94a-f258-f8a9bb69d05b SUN4V-8000-JA
100% fault.cpu.ultraSPARC-T2.misc_regs

Problem in: cpu:///cpuid=16/serial=5D67334847


Affects: cpu:///cpuid=16/serial=5D67334847
FRU: hc://:serial=101083:part=541215101/motherboard=0
Location: MB

CODE EXAMPLE: PSH Message Output


CPU errors exceeded acceptable levels

Type
Fault
Severity
Major
Description
The number of errors associated with this CPU has exceeded
acceptable levels.
Automated Response
The fault manager will attempt to remove the affected CPU from
service.
Impact
System performance may be affected.

Suggested Action for System Administrator

46 SPARC Enterprise T5440 Server Service Manual • July 2009


CODE EXAMPLE: PSH Message Output (Continued)
Schedule a repair procedure to replace the affected CPU, the
identity of which can be determined using fmdump -v -u <EVENT_ID>.

Details
The Message ID: SUN4V-8000-JA indicates diagnosis has
determined that a CPU is faulty. The Solaris fault manager arranged
an automated attempt to disable this CPU....

Clearing Faults
This section describes how to clear faults.

Note – Some system faults are cleared automatically.

Task Topic

Clear faults detected during POST. “Clear Faults Detected During POST” on
page 48
Clear faults detected by PSH. “Clear Faults Detected by PSH” on page 49
Clear faults detected in the Internal I/O “Clear Faults Detected in the External I/O
Expansion Unit Expansion Unit” on page 50

Related Information
■ “Diagnostic Flowchart” on page 11
■ “POST Fault Management Overview” on page 19
■ “Solaris Predictive Self-Healing Overview” on page 17
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server
■ External I/O Expansion Unit Installation and Service Manual for SPARC Enterprise
T5120/T5140/T5220/T5240/T5440 Servers

Managing Faults 47
▼ Clear Faults Detected During POST
In most cases, when POST detects a faulty component, POST logs the fault and
automatically takes the failed component out of operation by placing the component
in the ASR blacklist. See “Disabling Faulty Components” on page 50.

In most cases, the replacement of the faulty FRU is detected when the service
processor is reset or power cycled. In this case, the fault is automatically cleared from
the system. This procedure describes how to identify a POST-detected fault and, if
necessary, manually clear the fault.

1. After replacing a faulty FRU, at the ILOM prompt use the show faulty
command to identify POST-detected faults.
Faults detected by POST are distinguished from other kinds of faults by the text:
Forced fail. No UUID number is reported. Refer to CODE EXAMPLE: Fault
Detected by POST on page 48.
If no fault is reported, you do not need to do anything else. Do not perform the
subsequent steps.

2. Use the component_state property of the component to clear the fault and
remove the component from the ASR blacklist.
Use the FRU name that was reported in the fault in Step 1:

-> set /SYS/MB/CPU0/CMP0/BR1/CH0/D0 component_state=Enabled

The fault is cleared and should not show up when you run the show faulty
command. Additionally, the Service Required LED is no longer on.

3. Reset the server.


You must reboot the server for the component_state property to take effect.

4. At the ILOM prompt, use the show faulty command to verify that no faults
are reported.

-> show faulty


Target | Property | Value
--------------------+------------------------+------------------

->

CODE EXAMPLE: Fault Detected by POST


-> show faulty
Target | Property | Value
----------------------+------------------------+----------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/CPU0/CMP0/BR1/CH0/D0

48 SPARC Enterprise T5440 Server Service Manual • July 2009


CODE EXAMPLE: Fault Detected by POST (Continued)
/SP/faultmgmt/0 | timestamp | Dec 21 16:40:56
/SP/faultmgmt/0/ | timestamp | Dec 21 16:40:56
faults/0 | |
/SP/faultmgmt/0/ | sp_detected_fault | /SYS/MB/CPU0/CMP0/BR1/CH0/D0
faults/0 | | Forced fail(POST)

▼ Clear Faults Detected by PSH


When the Solaris PSH facility detects faults, the faults are logged and displayed on
the console. In most cases, after the fault is repaired, the corrected state is detected by
the system and the fault condition is repaired automatically. However, this repair
should be verified. In cases where the fault condition is not automatically cleared, the
fault must be cleared manually.

1. After replacing a faulty FRU, power on the server.

2. At the ILOM prompt, use the show faulty command to identify PSH-detected
faults.
■ If no fault is reported, you do not need to do anything else. Do not perform the
subsequent steps.
■ If a fault is reported, perform Step 3 and Step 4.

3. Use the clear_fault_action property of the FRU to clear the fault from the
service processor. For example:

-> set /SYS/MB/CPU0/CMP0/BR0/CH0/D0 clear_fault_action=True


Are you sure you want to clear /SYS/MB/CPU0/CMP0/BR0/CH0/D0 (y/n)? y
Set ’clear_fault_action’ to ’true

4. Clear the fault from all persistent fault records.


In some cases, even though the fault is cleared, some persistent fault information
remains and results in erroneous fault messages at boot time. To ensure that these
messages are not displayed, perform the following Solaris command:
fmadm repair UUID
Example:

# fmadm repair 7ee0e46b-ea64-6565-e684-e996963f7b86

Managing Faults 49
▼ Clear Faults Detected in the External I/O
Expansion Unit
For service processor detected faults in the External I/O Expansion Unit, the fault
must be manually cleared from ILOM show faulty after the problem has been
repaired.

Note – After the problem has been repaired, the fault will also be cleared from the
ILOM show faulty command by resetting the service processor.

The example below shows a problem detected in the External I/O Expansion Unit:

-> show faulty


Target | Property | Value
--------------------+------------------------+-------------------
-------------
/SP/faultmgmt/0 | fru | /SYS/IOX@X0TC/IOB1/LINK
/SP/faultmgmt/0 | timestamp | Feb 05 18:28:20
/SP/faultmgmt/0/ | timestamp | Feb 05 18:28:20
faults/0 | |
/SP/faultmgmt/0/ | sp_detected_fault | Ext FRU
/SYS/IOX@X0TC/IOB1/LINK
faults/0 | | SIGCON=0 I2C no
device response

● After the problem is repaired, use the ILOM set clear_fault_action


command to clear a fault in the External I/O Expansion Unit.

-> set clear_fault_action=true /SYS/IOX@X0TC/IOB1/LINK


Are you sure you want to clear /SYS/IOX@X0TC/IOB1/LINK (y/n)? y
Set ’clear_fault_action’ to ’true’

Disabling Faulty Components


You can use the Automatic System Recovery (ASR) feature to configure the server to
automatically disable failed components until they can be replaced. The following
components are managed by the ASR feature:
■ UltraSPARC T2 Plus processor strands
■ Memory FB-DIMMs

50 SPARC Enterprise T5440 Server Service Manual • July 2009


■ I/O subsystem

The database that contains the list of disabled components is referred to as the ASR
blacklist (asr-db).

In most cases, POST automatically disables a faulty component. After the cause of
the fault is repaired (FRU replacement, loose connector reseated, and so on), you
might need to remove the component from the ASR blacklist.

Note – For instructions on enabling or disabling ASR, see the SPARC Enterprise
T5440 Server Administration Guide.

The ASR commands (TABLE: ASR Commands on page 51) enable you to view and
manually add or remove components (asrkeys) from the ASR blacklist. You run
these commands from the ILOM -> prompt.

TABLE: ASR Commands

Command Description

show components Displays system components and their current state.


set asrkey component_state= Removes a component from the asr-db blacklist,
Enabled where asrkey is the component to enable.
set asrkey component_state= Adds a component to the asr-db blacklist, where
Disabled asrkey is the component to disable.

Note – The asrkeys vary from system to system, depending on how many cores and
memory are present. Use the show components command to see the asrkeys on a
given system.

Note – A reset or power cycle is required after disabling or enabling a component. If


the status of a component is changed, there is no effect to the system until the next
reset or power cycle.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Detecting Faults” on page 30
■ SPARC Enterprise T5440 Server Administration Guide

Managing Faults 51
▼ Disable System Components
The component_state property disables a component by adding it to the ASR
blacklist.

1. At the -> prompt, set the component_state property to Disabled:

-> set /SYS/MB/CPU0/CMP0/BR1/CH0/D0 component_state=Disabled

2. Reset the server so that the ASR command takes effect.

-> stop /SYS


Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

Note – In the ILOM shell there is no notification when the system is actually
powered off. Powering off takes about a minute. Use the show /HOST command to
determine if the host has powered off.

▼ Re-Enable System Components


The component_state property enables a component by removing it from the ASR
blacklist.

1. At the -> prompt, set the component_state property to Enabled.

-> set /SYS/MB/CPU0/CMP0/BR1/CH0/D0 component_state=Enabled

2. Reset the server so that the ASR command takes effect.

-> stop /SYS


Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

52 SPARC Enterprise T5440 Server Service Manual • July 2009


Note – In the ILOM shell there is no notification when the system is actually
powered off. Powering off takes about a minute. Use the show /HOST command to
determine if the host has powered off.

ILOM-to-ALOM CMT Command


Reference
TABLE: ALOM CMT Parameters and POST Modes on page 56 describes the typical
commands for servicing a server. For descriptions of all ALOM CMT commands,
issue the help command or refer to the following documents:
■ Integrated Lights Out Manager 3.0 Concepts Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

TABLE: Service-Related Commands

ILOM Command ALOM CMT Command Description

help [command] help [command] Displays a list of all available


commands with syntax and
descriptions. Specifying a command
name as an option displays help for
that command.
set /HOST/send_break_action break [-y][-c][-D] Takes the host server from the OS to
true • -y skips the confirmation either kmdb or OpenBoot PROM
question. (equivalent to a Stop-A), depending
• -c executes a console command on the mode Solaris software was
after the break command booted.
completes.
• -D forces a core dump of the
Solaris OS.
set clearfault UUID Manually clears host-detected
/SYS/component/clear_fault_a faults. The UUID is the unique fault
ction true ID of the fault to be cleared.
start /SP/console console [-f] Connects you to the host system.
• -f forces the console to have read
and write capabilities.

Managing Faults 53
TABLE: Service-Related Commands (Continued)

ILOM Command ALOM CMT Command Description

show /SP/console/history consolehistory [-b lines|-e lines|-v] Displays the contents of the
[-g lines] [boot|run] system’s console buffer.
The following options enable you to
specify how the output is
displayed:
• -g lines specifies the number of
lines to display before pausing.
• -e lines displays n lines from the
end of the buffer.
• -b lines displays n lines from the
beginning of the buffer.
• -v displays the entire buffer.
• boot|run specifies the log to
display (run is the default log).
set bootmode value Enables control of the firmware
/HOST/bootmode/value[normal|re [normal|reset_nvram| during system initialization with
set_nvram|bootscript=string] bootscript=string] the following options:
• normal is the default boot mode.
• reset_nvram resets OpenBoot
PROM parameters to their
default values.
• bootscript=string enables the
passing of a string to the boot
command.
stop/SYS; start/SYS powercycle [-f] Performs a poweroff followed by
The -f option forces an immediate poweron.
poweroff. Otherwise the command
attempts a graceful shutdown.
stop/SYS poweroff [-y] [-f] Powers off the host server.
• -y enables you to skip the
confirmation question.
• -f forces an immediate
shutdown.

start/SYS poweron [-c] Powers on the host server.


• -c executes a console command
after completion of the poweron
command.

54 SPARC Enterprise T5440 Server Service Manual • July 2009


TABLE: Service-Related Commands (Continued)

ILOM Command ALOM CMT Command Description

set removefru PS0|PS1 Indicates if it is okay to perform a


/SYS/PSx/prepare_to_remove_acti hot-swap of a power supply. This
on true command does not perform any
action. But this command provides
a warning if the power supply
should not be removed because the
other power supply is not enabled.
reset /SYS reset [-y] [-c] Generates a hardware reset on the
• -y enables you to skip the host server.
confirmation question.
• -c executes a console command
after completion of the reset
command.

reset /SP resetsc [-y] Reboots the service processor.


• -y enables you to skip the
confirmation question.
set /SYS/keyswitch_state setkeyswitch [-y] value Sets the virtual keyswitch.
value normal | stby | diag | locked
normal | stby | diag | • -y enables you to skip the
locked confirmation question when
setting the keyswitch to stby.
set /SUS/LOCATE value=value setlocator value Turns the Locator LED on the server
[Fast_blink | Off] [on | off] on or off.

(No ILOM equivalent.) showenvironment Displays the environmental status


of the host server. This information
includes system temperatures,
power supply, front panel LED,
hard drive, fan, voltage, and current
sensor status. See “Display
Individual Component Information
With the ILOM show Command”
on page 25.
show faulty showfaults [-v] Displays current system faults. See
“Detecting Faults” on page 30.

Managing Faults 55
TABLE: Service-Related Commands (Continued)

ILOM Command ALOM CMT Command Description

(No ILOM equivalent.) showfru [-g lines] [-s | -d] [FRU] Displays information about the
• -g lines specifies the number of FRUs in the server.
lines to display before pausing
the output to the screen.
• -s displays static information
about system FRUs (defaults to
all FRUs, unless one is specified).
• -d displays dynamic information
about system FRUs (defaults to
all FRUs, unless one is specified).
See “Display Individual
Component Information With the
ILOM show Command” on
page 25.

show /SYS/keyswitch_state showkeyswitch Displays the status of the virtual


keyswitch.
show /SYS/LOCATE showlocator Displays the current state of the
Locator LED as either on or off.
show /SP/logs/event/list showlogs [-b lines | -e lines | Displays the history of all events
-v] [-g lines] [-p logged in the service processor
logtype[r|p]]] event buffers (in RAM or the
persistent buffers).
show /SYS showplatform [-v] Displays information about the
operating state of the host system,
the system serial number, and
whether the hardware is providing
service.

TABLE: ALOM CMT Parameters and POST Modes on page 56 shows typical
combinations of ALOM CMT variables and associated POST modes.

TABLE: ALOM CMT Parameters and POST Modes

Normal Diagnostic Mode Diagnostic Service Keyswitch Diagnostic


Parameter (Default Settings) No POST Execution Mode Preset Values

diag mode normal Off service normal


keyswitch_state normal normal normal diag
diag_level max N/a max max

56 SPARC Enterprise T5440 Server Service Manual • July 2009


TABLE: ALOM CMT Parameters and POST Modes

Normal Diagnostic Mode Diagnostic Service Keyswitch Diagnostic


Parameter (Default Settings) No POST Execution Mode Preset Values

diag_trigger power-on-reset None all-resets all-resets


error-reset
diag_verbosity normal N/a max max
Description of POST This is the default POST POST does not POST runs the full POST runs the full
execution configuration. This run, resulting in spectrum of tests spectrum of tests
configuration tests the quick system with the maximum with the maximum
system thoroughly, and initialization. This output displayed. output displayed.
suppresses some of the is not a suggested
detailed POST output. configuration.

Related Information
■ “Diagnostic Flowchart” on page 11
■ “Detecting Faults Using LEDs” on page 30
■ “ILOM-to-ALOM CMT Command Reference” on page 53
■ SPARC Enterprise T5440 Server Administration Guide
■ Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise T5440 Server

Managing Faults 57
58 SPARC Enterprise T5440 Server Service Manual • July 2009
Preparing to Service the System

These topics describe how to prepare the SPARC Enterprise T5440 for servicing.

Topic Links

Observe proper safety practices. “Safety Information” on page 59


Gather the tools needed to perform service “Required Tools” on page 62
procedures.
Obtain the chassis serial number. “Obtain the Chassis Serial Number” on
page 62
Power off the system. “Powering Off the System” on page 63
Slide the server out of the equipment rack. “Extending the Server to the Maintenance
Position” on page 65
Remove the server from the equipment rack. “Removing the Server From the Rack” on
page 67
Remove the top cover to access internal “Removing the Top Cover” on page 69
components.

Related Information
■ “Managing Faults” on page 9
■ “Servicing Customer-Replaceable Units” on page 71
■ “Servicing Field-Replaceable Units” on page 115
■ “Returning the Server to Operation” on page 149

Safety Information
This section describes important safety information that you need to know prior to
removing or installing parts in the SPARC Enterprise T5440 server.

59
For your protection, observe the following safety precautions when setting up your
equipment:
■ Follow all cautions and instructions marked on the equipment and described in
the documentation shipped with your system.
■ Follow all cautions and instructions marked on the equipment and described in
the SPARC Enterprise T5440 Server Safety and Compliance Guide.
■ Ensure that the voltage and frequency of your power source match the voltage
and frequency inscribed on the equipment’s electrical rating label.
■ Follow the electrostatic discharge safety practices as described in this section.

Related Information
■ “Safety Symbols” on page 60
■ “Antistatic Wrist Strap” on page 61
■ “Antistatic Mat” on page 61
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69

Safety Symbols
Note the meanings of the following symbols that might appear in this document:

Caution – There is a risk of personal injury or equipment damage. To avoid


personal injury and equipment damage, follow the instructions.

Caution – Hot surface. Avoid contact. Surfaces are hot and might cause personal
injury if touched.

Caution – Hazardous voltages are present. To reduce the risk of electric shock and
danger to personal health, follow the instructions.

Related Information
■ “Safety Information” on page 59

60 SPARC Enterprise T5440 Server Service Manual • July 2009


Electrostatic Discharge Safety Measures
Electrostatic discharge (ESD) sensitive devices, such as the motherboard, PCI cards,
hard drives, and memory modules require special handling.

Caution – Circuit boards and hard drives contain electronic components that are
extremely sensitive to static electricity. Ordinary amounts of static electricity from
clothing or the work environment can destroy the components located on these
boards. Do not touch the components along their connector edges.

Caution – You must disconnect both power supplies before servicing any of the
components documented in this chapter.

Related Information
■ “Safety Information” on page 59
■ “Antistatic Wrist Strap” on page 61
■ “Antistatic Mat” on page 61

Antistatic Wrist Strap


Wear an antistatic wrist strap and use an antistatic mat when handling components
such as hard drive assemblies, circuit boards, or PCI cards. When servicing or
removing server components, attach an antistatic strap to your wrist and then to a
metal area on the chassis. Following this practice equalizes the electrical potentials
between you and the server.

Note – An antistatic wrist strap is no longer included in the accessory kit for the
SPARC Enterprise T5440 servers. However, antistatic wrist straps are still included
with options.

Antistatic Mat
Place ESD-sensitive components such as motherboards, memory, and other PCBs on
an antistatic mat.

Preparing to Service the System 61


Required Tools
The SPARC Enterprise T5440 server can be serviced with the following tools:
■ Antistatic wrist strap
■ Antistatic mat
■ No. 1 Phillips screwdriver
■ No. 2 Phillips screwdriver
■ 7 mm hex driver
■ No. 1 flat-blade screwdriver (battery removal)
■ Pen or pencil (power on server)

▼ Obtain the Chassis Serial Number


To obtain support for your system, you need your chassis serial number.

● The chassis serial number is located on a sticker that is on the front of the
server and another sticker on the side of the server.

▼ Obtain the Chassis Serial Number


Remotely
● Use the ILOM show /SYS command to obtain the chassis serial number.

-> show /SYS

/SYS
Targets:
SERVICE
LOCATE
ACT
PS_FAULT
TEMP_FAULT

62 SPARC Enterprise T5440 Server Service Manual • July 2009


FAN_FAULT
...
Properties:
type = Host System
keyswitch_state = Normal
product_name = T5440
product_serial_number = 0723BBC006
fault_state = OK
clear_fault_action = (none)
power_state = On

Commands:
cd
reset
set
show
start
stop

Powering Off the System


Note – Additional information about powering off the system is located in the
SPARC Enterprise T5440 Server Administration Guide.

Related Information
■ “Power Off From the Command Line” on page 64
■ “Power Off – Graceful Shutdown” on page 64
■ “Power Off – Emergency Shutdown” on page 65

Preparing to Service the System 63


▼ Power Off From the Command Line
1. Shut down the Solaris OS.
At the Solaris prompt, type:

# shutdown -g0 -i0 -y


# svc.startd: The system is coming down. Please wait.
svc.startd: 91 system services are now being stopped.
Jun 12 19:46:57 wgs41-58 syslogd: going down on signal 15
svc.stard: The system is down.
syncing file systems...done
Program terminated
r)eboot o)k prompt, h)alt?

2. Switch from the system console prompt to the service processor console prompt.
Type:

ok #.
->

3. From the ILOM -> prompt, type:

-> stop /SYS


Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS

->

Note – To perform an immediate shutdown, use the stop -force -script /SYS
command. Ensure that all data is saved before entering this command.

▼ Power Off – Graceful Shutdown


● Press and release the Power button.
If necessary, use a pen or pencil to press the Power button.

64 SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Power Off – Emergency Shutdown

Caution – All applications and files will be closed abruptly without saving changes.
File system corruption might occur.

● Press and hold the Power button for four seconds.

▼ Disconnect Power Cords From the Server


● Unplug all power cords from the server.

Caution – Because 3.3v standby power is always present in the system, you must
unplug the power cords before accessing any cold-serviceable components.

Extending the Server to the Maintenance


Position
The following components can be serviced with the server in the maintenance
position:
■ Fan trays
■ CMP/memory modules
■ FB-DIMMs
■ PCIe/XAUI cards
■ Service processor
■ Power supply backplane
■ Hard drive backplane

Related Information
■ “Front Panel Diagram” on page 3
■ “Rear Panel Diagram” on page 5
■ “Extend the Server to the Maintenance Position” on page 66

Preparing to Service the System 65


▼ Extend the Server to the Maintenance Position
1. (Optional) Use the set /SYS/LOCATE command from the -> prompt to locate
the system that requires maintenance.

-> set /SYS/LOCATE value=Fast_Blink

Once you have located the server, press the Locator LED and button to turn it off.

2. Verify that no cables will be damaged or will interfere when the server is
extended.
Although the cable management arm (CMA) that is supplied with the server is
hinged to accommodate extending the server, you should ensure that all cables
and cords are capable of extending.

3. From the front of the server, release the two slide release latches (FIGURE:
Extending the Server Into the Maintenance Position on page 66).
Squeeze the slide rail locks to release the slide rails.

FIGURE: Extending the Server Into the Maintenance Position

Figure Legend

1 Slide Rail Lock


2 Inner Rail Release Button

66 SPARC Enterprise T5440 Server Service Manual • July 2009


4. While squeezing the slide rail locks, slowly pull the server forward until it is
locked in the service position.

Removing the Server From the Rack


The server must be removed from the rack to remove or install the following
components:
■ Motherboard

Caution – Two people must dismount and carry the chassis.

FIGURE: Lift Warning

Related Information
■ “Front Panel Diagram” on page 3
■ “Rear Panel Diagram” on page 5
■ “Extend the Server to the Maintenance Position” on page 66
■ “Remove the Server From the Rack” on page 67

▼ Remove the Server From the Rack


1. Disconnect all the cables and power cords from the server.

2. Extend the server to the maintenance position.


See “Extending the Server to the Maintenance Position” on page 65.

Preparing to Service the System 67


3. Disconnect the CMA.
Pull out the retention pin that secures the cable management arm (CMA) to the
rack rail (FIGURE: Removing the Server From the Rack on page 68). Slide the
CMA out of the end of the inner glide. The CMA is still attached to the cabinet,
but the server is now disconnected from the CMA.

FIGURE: Removing the Server From the Rack

Figure Legend

1 Disconnect system cables and CMA.


2 Press inner rail release buttons to remove the server from the rack.

Caution – Use two people to dismount and carry the chassis.

4. From the front of the server, press inner rail release buttons and pull the server
forward until it is free of the rack rails.

5. Set the server on a sturdy work surface.

68 SPARC Enterprise T5440 Server Service Manual • July 2009


Performing Electrostatic Discharge –
Antistatic Prevention Measures

Related Information
■ “Electrostatic Discharge Safety Measures” on page 61
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69

▼ Perform Electrostatic Discharge – Antistatic


Prevention Measures
1. Prepare an antistatic surface to set parts on during the removal, installation, or
replacement process.
Place ESD-sensitive components such as the printed circuit boards on an antistatic
mat. The following items can be used as an antistatic mat:
■ Antistatic bag used to wrap a replacement part
■ ESD mat
■ A disposable ESD mat (shipped with some replacement parts or optional
system components)

2. Attach an antistatic wrist strap.


When servicing or removing server components, attach an antistatic strap to your
wrist and then to a metal area on the chassis.

Removing the Top Cover

▼ Remove the Top Cover


Before you begin, complete these tasks:

Preparing to Service the System 69


■ Read the section, “Safety Information” on page 59.
■ Power off the server using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69

1. Loosen the two captive No. 2 Phillips screws at the rear edge of the top panel.

2. Slide the top cover to the rear about 0.5 inch (12.7 mm).

3. Remove the top cover.


Lift up and remove the cover.

Caution – If the top cover is removed before the server is powered off, the server
will immediately disable the front panel Power button and shut down. After such an
event, you must replace the top cover and use the poweron command to power on
the server. See “Power On the Server” on page 153.

FIGURE: Removing the Top Cover

70 SPARC Enterprise T5440 Server Service Manual • July 2009


Servicing Customer-Replaceable
Units

These topics describe how to service customer-replaceable units (CRUs) in the


SPARC Enterprise T5440 server.

Topic Links

Read and learn about components which “Hot-Pluggable and Hot-Swappable


can be serviced while the system is in Devices” on page 72
operation.
Remove, install and add hard drives. “Servicing Hard Drives” on page 72
Remove and install fan trays. “Servicing Fan Trays” on page 81
Remove and install power supplies. “Servicing Power Supplies” on page 85
Remove, install, and add PCIe cards. “Servicing PCIe Cards” on page 92
Remove, install, and add CMP or memory “Servicing CMP/Memory Modules” on
modules. page 98
Remove, install, and add FB-DIMMs. “Servicing FB-DIMMs” on page 104
Exploded views of CRUs “Customer-Replaceable Units” on page 174

Related Information
■ “Servicing Field-Replaceable Units” on page 115

71
Hot-Pluggable and Hot-Swappable
Devices
Hot-pluggable devices are those devices that you can remove and install while the
server is running. However, you must perform administrative tasks before or after
installing the hardware (for example, mounting a hard drive). In the SPARC
Enterprise T5440 server, the following devices are hot-pluggable:
■ Hard drives

Hot-swappable devices are those devices that can be removed and installed while
the server is running without affecting the rest of the server’s capabilities. In the
SPARC Enterprise T5440 server, the following devices are hot-swappable:
■ Fan trays
■ Power supplies

Note – The chassis-mounted hard drives can be hot-swappable, depending on how


they are configured.

Related Information
■ “Servicing Hard Drives” on page 72
■ “Servicing Fan Trays” on page 81
■ “Servicing Power Supplies” on page 85
■ “Server Components” on page 173

Servicing Hard Drives


The hard drives in the server are hot-pluggable, but this capability depends on how
the hard drives are configured. To hot-plug a drive you must take the drive offline
before you can safely remove it. Taking a drive offline prevents any applications
from accessing it, and removes the logical software links to it.

72 SPARC Enterprise T5440 Server Service Manual • July 2009


Caution – You must use hard drives designed for this server, which have a vented
front panel to allow adequate airflow to internal system components. Installing
inappropriate hard drives could result in an overtemperature condition.

The following situations inhibit your ability to hot-plug a drive:


■ If the hard drive contains the operating system, and the operating system is not
mirrored on another drive.
■ If the hard drive cannot be logically isolated from the online operations of the
server.

If your drive falls into one of these conditions, you must power off the server before
you replace the hard drive.

Related Information
■ “Identifying Server Components” on page 1
■ “Managing Faults” on page 9
■ “Powering Off the System” on page 63
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Hard Drive Device Identifiers” on page 79
■ “Hard Drive LEDs” on page 80
■ “Server Components” on page 173

▼ Remove a Hard Drive (Hot-Plug)


Removing a hard drive from the server is a three-step process. You must first identify
the drive you want to remove, unconfigure that drive from the server, and then
manually remove the drive from the chassis.

Note – See “Hard Drive Device Identifiers” on page 79 for information about
identifying hard drives.

Before you begin, complete these tasks:


■ Read the section, “Safety Information” on page 59.

Servicing Customer-Replaceable Units 73


1. At the Solaris prompt, issue the cfgadm -al command to list all drives in the
device tree, including drives that are not configured. Type:

# cfgadm -al

This command should identify the Ap_id for the hard drive you wish to remove,
as in CODE EXAMPLE: Sample Ap_id Output on page 75.

2. Issue the cfgadm -c unconfigure command to unconfigure the disk.


For example, type:

# cfgadm -c unconfigure c0::dsk/d1t1d1

where c0:dsk/c0t1d1 is the disk that you are trying to unconfigure.

3. Wait until the blue Ready-to-Remove LED lights.


This LED will help you identify which drive is unconfigured and can be removed.

4. On the drive you plan to remove, push the hard drive release button to open the
latch (FIGURE: Removing a Hard Drive on page 74).

FIGURE: Removing a Hard Drive

Caution – The latch is not an ejector. Do not bend the latch too far. Doing so can
damage the latch.

74 SPARC Enterprise T5440 Server Service Manual • July 2009


5. Grasp the latch and pull the drive out of the drive slot.

CODE EXAMPLE: Sample Ap_id Output

Ap_id Type Receptacle Occupant Condition


c0 scsi-bus connected configured unknown
c0::dsk/d1t0d0 disk connected configured unknown
c0::dsk/d1t1d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
usb1/3 unknown empty unconfigured ok
usb2/1 unknown empty unconfigured ok
usb2/2 unknown empty unconfigured ok
usb2/3 unknown empty unconfigured ok
usb2/4 unknown empty unconfigured ok
usb2/5 unknown empty unconfigured ok
usb2/6 unknown empty unconfigured ok
usb2/7 unknown empty unconfigured ok
usb2/8 unknown empty unconfigured ok
----------------------------

▼ Install a Hard Drive (Hot-Plug)


Installing a hard drive into the SPARC Enterprise T5440 servers is a two-step process.
You must first install a hard drive into the desired drive slot. Then you must
configure that drive to the server.

Perform the following process to install a hard drive.

1. If necessary, remove the blank panel from the chassis.

Note – The server might have up to three blank panels covering unoccupied drive
slots.

2. Align the replacement drive to the drive slot.


Hard drives are physically addressed according to the slot in which they are
installed. If you removed an existing hard drive from a slot in the server, you must
install the replacement drive in the same slot as the drive that was removed.

3. Slide the drive into the drive slot until it is fully seated.

Servicing Customer-Replaceable Units 75


FIGURE: Installing a Hard Drive

4. Close the latch to lock the drive in place.

5. At the Solaris prompt, type the cfgadm -al command to list all drives in the
device tree, including any drives that are not configured. Type:

# cfgadm -al

This command should help you identify the Ap_id for the hard drive you
installed. For an output example refer to CODE EXAMPLE: Sample Ap_id Output
on page 77.

6. Type the cfgadm -c configure command to configure the disk.


For example, type:

# cfgadm -c configure c0::sd1

where c0::sd1 is the disk that you are trying to configure.

7. Wait until the blue Ready-to-Remove LED is no longer lit on the drive that you
installed.

8. At the Solaris prompt, type the cfgadm -al command to list all drives in the
device tree, including any drives that are not configured. Type:

# cfgadm -al

This command should identify the Ap_id for the hard drive that you installed.
The drive you installed should be is configured.

76 SPARC Enterprise T5440 Server Service Manual • July 2009


9. Type the iostat -E command. Type:

# iostat -E

The iostat -E command displays information about your system’s installed


devices such as manufacturer, model number, serial number, size, and system
error statistics.

CODE EXAMPLE: Sample Ap_id Output

Ap_id Type Receptacle Occupant Condition


c0 scsi-bus connected configured unknown
c0::dsk/d1t0d0 disk connected configured unknown
c0::sd1 disk connected unconfigured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
usb1/3 unknown empty unconfigured ok
usb2/1 unknown empty unconfigured ok
usb2/2 unknown empty unconfigured ok
usb2/3 unknown empty unconfigured ok
usb2/4 unknown empty unconfigured ok
usb2/5 unknown empty unconfigured ok
usb2/6 unknown empty unconfigured ok
usb2/7 unknown empty unconfigured ok
usb2/8 unknown empty unconfigured ok
---------------------------------

▼ Remove a Hard Drive


If you are removing a hard drive as a prerequisite for another service procedure,
follow the steps in this section.

Before you begin, complete these tasks:


■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69

Do the following :

1. Note the location of each hard drive.

Servicing Customer-Replaceable Units 77


Note – You must install each hard drive in the same bay from which it was removed.

2. Press the hard drive latch release button.

FIGURE: Removing a Hard Drive

3. Slide the hard drive out of its bay.

▼ Install a Hard Drive


If you are installing a hard drive after servicing another component in the system, do
the following:

1. Align the replacement drive to the drive slot.


Hard drives are physically addressed according to the slot in which they are
installed. If you removed an existing hard drive from a slot in the server, you must
install the replacement drive in the same slot as the drive that was removed.

2. Slide the drive into the drive slot until it is fully seated.

78 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Installing a Hard Drive

3. Close the latch to lock the drive in place.

4. If you performed any additional service procedures, see “Power On the Server”
on page 153.

Hard Drive Device Identifiers


TABLE: Physical Drive Locations, FRU Names, and Default Drive Path Names on
page 79 lists physical drive locations and their corresponding default path names in
OpenBoot PROM and Solaris for the SPARC Enterprise T5440 server.

TABLE: Physical Drive Locations, FRU Names, and Default Drive Path Names

Device Device Identifier OpenBoot PROM/Solaris Default Drive Path Name

HDD0 /SYS/HDD0 c0::dsk/d1t0d0


HDD1 /SYS/HDD1 c0::dsk/d1t1d0
HDD2 /SYS/HDD2 c0::dsk/d1t2d0
HDD3 /SYS/HDD3 c0::dsk/d1t3d0

Note – Hard drive names in ILOM messages are displayed with the full FRU name,
such as /SYS/HDD0.

Servicing Customer-Replaceable Units 79


Related Information
■ “Hard Drive LEDs” on page 80

Hard Drive LEDs


FIGURE: Hard Drive LEDs

TABLE: Hard Drive Status LEDs

No. LED Color Notes

1 Ready Blue This LED is lit to indicate that a hard drive


-to-Remove can be removed safely during a hot-plug
operation.

2 Service Amber This LED is lit when the system is running


Required and the hard drive is faulty.

3 OK/Activity Green This LED lights when data is being read


from or written to the hard drive.

The front and rear panel Service Required LEDs are also lit if the system detects a
hard drive fault.

Related Information
■ “Hard Drive Device Identifiers” on page 79

80 SPARC Enterprise T5440 Server Service Manual • July 2009


Servicing Fan Trays
Four fan trays are located toward the front of the server, arranged in two N+1
redundant pairs. Each fan tray contains a fan mounted in an integrated,
hot-swappable CRU. If a fan tray fails, replace it as soon as possible to maintain
server availability.

Caution – Hazardous moving parts. Unless the power to the server is completely
shut down, the only service permitted in the fan compartment is the replacement of
the fan trays by trained personnel.

Related Information
■ “Identifying Server Components” on page 1
■ “Managing Faults” on page 9
■ “Powering Off the System” on page 63
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Fan Tray Device Identifiers” on page 84
■ “Fan Tray Fault LED” on page 84
■ “Server Components” on page 173

▼ Remove a Fan Tray (Hot-Swap)


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Perform the task, “Extend the Server to the Maintenance Position” on page 66.
■ Perform the task, “Perform Electrostatic Discharge – Antistatic Prevention
Measures” on page 69.

Do the following :

1. Identify the fan tray to be removed.


See “Fan Tray Device Identifiers” on page 84 and “Fan Tray Fault LED” on
page 84.

2. Press the fan tray latches toward the center of the fan tray and pull the fan tray
up and out of the system.

Servicing Customer-Replaceable Units 81


FIGURE: Removing a Fan Tray

▼ Install a Fan Tray (Hot-Swap)


1. Slide the fan tray into its bay until it locks into place.
Ensure that the fan tray is oriented correctly. Airflow in the system is from front to
back.

2. Verify proper fan tray operation.


See “Fan Tray Fault LED” on page 84.

Next Steps
If you are replacing a faulty fan tray due to an overtemperature condition, monitor
the system to ensure proper cooling.
■ “Slide the Server Into the Rack” on page 151
■ If you performed any additional service procedures, see “Power On the Server” on
page 153.

82 SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Remove a Fan Tray
If you are removing the fan trays as a prerequisite for another service procedure,
follow the steps in this procedure.

Before you begin, complete these tasks:


■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ Perform the task, “Extend the Server to the Maintenance Position” on page 66
■ Perform the task, “Perform Electrostatic Discharge – Antistatic Prevention
Measures” on page 69

Do the following :

● Press the fan tray latches toward the center of the fan tray and pull the fan tray
up and out of the system.

FIGURE: Removing a Fan Tray

Servicing Customer-Replaceable Units 83


▼ Install a Fan Tray
1. Slide each fan tray into its bay until it locks into place.
Ensure that the fan tray is oriented correctly. Airflow in the system is from front to
back.

2. Verify proper fan tray operation.


See “Fan Tray Fault LED” on page 84.

Next Steps
If you are replacing the fan trays after performing another service procedure,
complete these steps.
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

Fan Tray Device Identifiers


TABLE: Fan Tray Device Identifiers on page 84 describes the FRU device names for
the fan trays in the server.

TABLE: Fan Tray Device Identifiers

Device Device Identifier

FT0 /SYS/MB/FT0
FT1 /SYS/MB/FT1
FT2 /SYS/MB/FT2
FT3 /SYS/MB/FT3

Related Information
■ “Managing Faults” on page 9
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Fan Tray Fault LED” on page 84

Fan Tray Fault LED


Each fan tray contains a Fault LED that is located on the top panel of the server. The
LED is visible when you slide the server partially out of the rack.

84 SPARC Enterprise T5440 Server Service Manual • July 2009


See TABLE: Fan Tray Fault LED on page 85 for a description of fan tray Fault LED
and its function.

TABLE: Fan Tray Fault LED

LED Color Notes

Fault Amber This LED is lit when the fan tray is faulty.

The front panel Fan Fault LED, and the front and rear panel Service Required LEDs
are also lit if the system detects a fan tray fault. In addition, the system Overtemp
LED might be lit if a fan fault causes an increase in system operating temperature.

See “Front Panel LEDs” on page 4 and “Rear Panel LEDs” on page 7 for more
information about system status LEDs.

Related Information
■ “Managing Faults” on page 9
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Fan Tray Fault LED” on page 84

Servicing Power Supplies


The server is equipped with redundant hot-swappable power supplies. Redundant
power supplies enable you to remove and replace a power supply without shutting
the server down, provided that at least two other power supplies are online and
working.

Note – If a power supply fails and you do not have a replacement available, leave
the failed power supply installed to ensure proper airflow in the server.

Related Information
■ “Identifying Server Components” on page 1
■ “Managing Faults” on page 9
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Power Supply Device Identifiers” on page 91

Servicing Customer-Replaceable Units 85


■ “Power Supply LED” on page 91
■ “Server Components” on page 173
■ SPARC Enterprise T5440 Server Site Planning Guide

▼ Remove a Power Supply (Hot-Swap)

Caution – Hazardous voltages are present. To reduce the risk of electric shock and
danger to personal health, follow the instructions.

Note – If you are servicing Power Supply 0, you must disconnect the cable
management arm support strut.

1. Identify which power supply requires replacement.


An amber LED on a power supply indicates that a failure was detected. In
addition, the show faulty command indicates which power supply is faulty. See
“Detecting Faults” on page 30.

2. Gain access to the rear of the server where the faulty power supply is located.
If necessary, slide the system partially out of the rack to obtain better access to the
rear panel.

3. Disconnect the power cord from the faulty power supply.

4. Grasp the power supply handle and press the release latch.

86 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Removing a Power Supply

5. Pull the power supply out of the chassis.

▼ Install a Power Supply (Hot-Swap)


1. Align the replacement power supply with the empty power supply bay.

2. Slide the power supply into the bay until it is fully seated.

Servicing Customer-Replaceable Units 87


FIGURE: Installing a Power Supply

3. Reconnect the power cord to the power supply.


Verify that the power supply LED is green or blinking green.

4. Verify that the system Power Supply Fault LED, and the front and rear Service
Required LEDs are not lit.

Note – See “Front Panel LEDs” on page 4 and “Rear Panel LEDs” on page 7 for more
information about identifying and interpreting system LEDs.

5. At the ILOM -> prompt, use the show faulty command to verify the status of
the power supplies.

88 SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Remove a Power Supply

Caution – Hazardous voltages are present. To reduce the risk of electric shock and
danger to personal health, follow the instructions.

If you are removing the power supplies as a prerequisite for another service
procedure, follow these steps.

Before you begin, complete these tasks:


■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Disconnect Power Cords From the Server” on page 65
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69

Note – If you are servicing Power Supply 0, you must disconnect the cable
management arm support strut.

1. Grasp the power supply handle and press the release latch.

FIGURE: Removing a Power Supply

2. Pull the power supply out of the chassis.

Servicing Customer-Replaceable Units 89


▼ Install a Power Supply
If you are installing the power supplies following another service tasks, complete
these steps.

1. Align the replacement power supply with the empty power supply bay.

FIGURE: Installing a Power Supply

2. Slide the power supply into the bay until it is fully seated.

Next Steps
■ “Connect the Power Cords to the Server” on page 153
■ “Power On the Server” on page 153

90 SPARC Enterprise T5440 Server Service Manual • July 2009


Power Supply Device Identifiers
TABLE: Power Supply FRU Names on page 91 describes the FRU device names for
power supplies in the servers.

TABLE: Power Supply FRU Names

Device Device Identifier

PS0 /SYS/PS0
PS1 /SYS/PS1
PS2 /SYS/PS2
PS3 /SYS/PS3

Note – Power supply names in ILOM messages are displayed with the full FRU
name, such as /SYS/PS0.

Related Information
■ “Managing Faults” on page 9
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Power Supply LED” on page 91

Power Supply LED


Each power supply contains a dual-color LED that is visible when looking at the
back panel of the system.

See TABLE: Power Supply Status LEDs on page 91 for a description of power supply
LED modes and their function, listed from top to bottom.

TABLE: Power Supply Status LEDs

LED State Meaning Notes

Off No AC present Power supply is unplugged or if no AC power


is present.
Blinking green AC present/system AC power is present and system is in standby
in standby mode.

Servicing Customer-Replaceable Units 91


TABLE: Power Supply Status LEDs (Continued) (Continued)

LED State Meaning Notes

Green AC present/system System is powered on.


powered on
Blinking amber Fault Voltage overcurrent or other power fault.
Amber Fault Internal power supply failure or power supply
fan failure.

The following LEDs are lit when a power supply fault is detected:
■ Front and rear Service Required LEDs
■ Rear PS Failure LED on the bezel of the server
■ Fault LED mode on the faulty power supply

The front and rear panel Service Required LEDs are also lit if the system detects a
power supply fault.

See “Front Panel LEDs” on page 4 and “Rear Panel LEDs” on page 7 for more
information about identifying and interpreting system LEDs.

See “Power Supply LED” on page 91 for specific information about power supply
status LEDs.

Related Information
■ “Managing Faults” on page 9
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Front Panel LEDs” on page 4
■ “Rear Panel LEDs” on page 7

Servicing PCIe Cards


Up to eight low-profile PCIe cards may be installed in the system. All slots are wired
to x8 PCIe lanes. Slot 1 and Slot 7 support graphics cards with x16 connectors. Slot 4
and Slot 5 also support 10-Gbyte Ethernet cards (XAUI cards). When a XAUI card is
installed, a PCIe card cannot be installed in the same slot.

Related Information
■ “PCIe Device Identifiers” on page 96

92 SPARC Enterprise T5440 Server Service Manual • July 2009


■ “PCIe Slot Configuration Guidelines” on page 97
■ “Performing Node Reconfiguration” on page 155

▼ Remove a PCIe Card


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69

Do the following :

1. Identify the PCIe card you want to remove.

2. Open the PCIe card latch.

FIGURE: Removing a PCIe Card

3. Remove the PCIe card the system.

4. Place the PCIe card on an antistatic mat.

5. If you are not replacing the PCIe card, install a PCIe filler panel in its place.

6. Close the PCIe card latch.

Servicing Customer-Replaceable Units 93


▼ Install a PCIe Card
1. Identify the correct slot for installation.

2. Open the PCIe card latch.

FIGURE: Installing a PCIe Card

3. Insert the PCIe card into its slot.

4. Close the PCIe card latch.

Next Steps
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

▼ Add a PCIe Card


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Disconnect Power Cords From the Server” on page 65
■ “Extend the Server to the Maintenance Position” on page 66

94 SPARC Enterprise T5440 Server Service Manual • July 2009


■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69

1. Identify the correct slot for installation.


See “PCIe Device Identifiers” on page 96 and “PCIe Slot Configuration
Guidelines” on page 97.

2. Open the PCIe card latch.

3. Remove the PCIe filler panel.

4. Insert the PCIe card into its slot.

FIGURE: Installing a PCIe Card

5. Close the PCIe card latch.

Next Steps
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

Servicing Customer-Replaceable Units 95


PCIe Device Identifiers
TABLE: PCIe Device Identifiers on page 96 describes device and device identifiers for
PCIe cards. Device identifiers are case-sensitive.

TABLE: PCIe Device Identifiers

Device Device Identifier Notes

PCIe0 /SYS/MB/PCIE0 x8 slot


PCIe1 /SYS/MB/PCIE1 x16 slot operating at x8
PCIe2 /SYS/MB/PCIE2 x8 slot
PCIe3 /SYS/MB/PCIE3 x8 slot
PCIe4 /SYS/MB/PCIE4 or x8 slot; shared with XAUI slot
(XAUI0) /SYS/MB/XAUI0
PCIe5 /SYS/MB/PCIE5 or x8 slot; shared with XAUI slot
(XAUI1) /SYS/MB/XAUI1
PCIe6 /SYS/MB/PCIE6 x16 slot operating at x8
PCIe7 /SYS/MB/PCIE7 x8 slot

Note – PCIe names in ILOM messages are displayed with the full FRU name, such
as /SYS/MB/PCIE0.

Note – In the Solaris OS, PCIe slot addresses are associated with CMP modules. The
PCIe slot address in the Solaris OS might change if you add or remove CMP
modules, or if a CMP module is brought offline. For more information, see the
SPARC Enterprise T5440 Server Product Notes.

Related Information
■ “Managing Faults” on page 9
■ “PCIe Slot Configuration Guidelines” on page 97
■ “System Bus Topology” on page 162
■ “Performing Node Reconfiguration” on page 155

96 SPARC Enterprise T5440 Server Service Manual • July 2009


PCIe Slot Configuration Guidelines
Use the guidelines in TABLE: Supported CMP/Memory Module Configurations on
page 104 to spread the load evenly across CMP/memory modules. If a slot is already
populated with a device, install a new device in the next available slot, in the order
indicated.

TABLE: PCIe Slot Configuration Guidelines

Number of CMP/Memory
PCIe/XAUI Card Type Modules Installation Order Notes

10 GBit Ethernet (XAUI) 1, 2, 3 or 4 Slot 4, 5 Install XAUI cards first.


card
External I/O Expansion 2 Slot 0, 4, 1, 5 Maximum of 4 cards; install in order
Unit PCIe Link card shown.
4 Slot 0, 4, 2, 6, 1, 5, 3, 7 Maximum of 8 cards; install in order
shown.
All other devices* 2 Slot 0, 4, 1, 5, 2, 6, 3, 7 Maximum of 8 cards; install in order
shown.
4 Slot 0, 4, 2, 6, 1, 5, 3, 7 Maximum of 8 cards; install in order
shown.
* These are guidelines to spread out the I/O load across multiple CMP/memory module pairs. These are not configuration restrictions.

External I/O Expansion Unit PCIe Link cards must be placed in a PCIe slot with a
CMP/memory module pair present as follows:
■ PCIe Slots 0 and 1 require CMP/Memory pair 0.
■ PCIe Slots 4 and 5 require CMP/Memory pair 1.
■ PCIe Slots 2 and 3 require CMP/Memory pair 2.
■ PCIe Slots 6 and 7 require CMP/Memory pair 3.

Related Information
■ “PCIe Device Identifiers” on page 96
■ “System Bus Topology” on page 162
■ “I/O Fabric in 2P Configuration” on page 164
■ “I/O Fabric in 4P Configuration” on page 165

Servicing Customer-Replaceable Units 97


Servicing CMP/Memory Modules
Up to four CMP/memory modules can be installed in the system. Each CMP module
is paired with a memory module. CMP modules and memory modules are keyed
uniquely to prevent incorrect insertion into the wrong type of slot.

A faulty CMP or memory module is indicated with an alluminated fault LED. An


alluminated module LED also might indicate a faulty FB-DIMM on that module.

FIGURE: CMP/Memory Module Pairs

98 SPARC Enterprise T5440 Server Service Manual • July 2009


Related Information
■ “CMP and Memory Module Device Identifiers” on page 103
■ “Supported CMP/Memory Module Configurations” on page 104
■ “I/O Connections to CMP/Memory Modules” on page 156
■ “Reconfiguring I/O Device Nodes” on page 158
■ “Servicing FB-DIMMs” on page 104
■ “System Bus Topology” on page 162
■ “I/O Fabric in 2P Configuration” on page 164
■ “I/O Fabric in 4P Configuration” on page 165

▼ Remove a CMP/Memory Module


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69

Do the following :

1. Identify the module you want to remove.

2. Rotate the ejector levers up and away from the module.

Servicing Customer-Replaceable Units 99


FIGURE: Removing a CMP Module

3. Slide the module up and out of the system.

4. Place the module on an antistatic mat.

▼ Install a CMP/Memory Module

Note – If you are replacing a faulty CMP or memory module, you must transfer the
FB-DIMMs on the faulty module to the replacement module. Replacement
CMP/memory modules do not include FB-DIMMs.

For more information about installing FB-DIMMs, see “Servicing FB-DIMMs” on


page 104

1. Identify the correct slot for installation.

100 SPARC Enterprise T5440 Server Service Manual • July 2009


2. Slide the module down into its slot.

FIGURE: Installing a CMP/Memory Module

3. Rotate the ejector levers down to secure the module into place.

Next Steps
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

▼ Add a CMP/Memory Module


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69

Do the following :

1. Identify the correct slot for installation.

Servicing Customer-Replaceable Units 101


2. Remove the air baffle.
Squeeze the air baffle latches toward each other and lift the air baffle straight up
and out of the chassis.

3. If you are installing the module into a previously empty slot, remove the plastic
connector cover on the motherboard.

4. Slide the module down into its slot.

FIGURE: Installing a CMP Module

5. Rotate the ejector levers down to secure the module into place.

Next Steps
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

102 SPARC Enterprise T5440 Server Service Manual • July 2009


CMP and Memory Module Device Identifiers
TABLE: CMP/Memory Module Device Identifier on page 103 describes device,
device identifiers, and supported configurations for CMP and memory modules.
Device identifiers are case-sensitive.

TABLE: CMP/Memory Module Device Identifier

Device Device Identifier

CMP0 /SYS/MB/CPU0/CMP0
MEM0 /SYS/MB/MEM0/CMP0
CMP1 /SYS/MB/CPU1/CMP1
MEM1 /SYS/MB/MEM1/CMP1
CMP2 /SYS/MB/CPU2/CMP2
MEM2 /SYS/MB/MEM2/CMP2
CMP3 /SYS/MB/CPU3/CMP3
MEM3 /SYS/MB/MEM3/CMP3

Note – CMP and memory module names in ILOM messages are displayed with the
full FRU name, such as /SYS/MB/CPU0.

Related Information
■ “Managing Faults” on page 9
■ “Supported FB-DIMM Configurations” on page 110
■ “Performing Node Reconfiguration” on page 155

Servicing Customer-Replaceable Units 103


Supported CMP/Memory Module Configurations
TABLE: Supported CMP/Memory Module Configurations on page 104 shows the
supported CMP/memory module configurations, as viewed from the front of the
server.

TABLE: Supported CMP/Memory Module Configurations

CMP3 CMP1 CMP2 CMP0


Configuration MEM3 MEM1 MEM2 MEM0

One CMP/memory pair X


Two CMP/memory pairs X X
Three CMP/memory pairs X X X
Four CMP/memory pairs X X X X
(full configurations)

Related Information
■ “CMP and Memory Module Device Identifiers” on page 103
■ “Performing Node Reconfiguration” on page 155

Servicing FB-DIMMs
Up to 16 FB-DIMMs can be installed in each CMP/memory module pair.

Related Information
■ “Managing Faults” on page 9
■ “Remove FB-DIMMs” on page 105
■ “Install FB-DIMMs” on page 105
■ “Verify FB-DIMM Replacement” on page 106
■ “Add FB-DIMMs” on page 109
■ “Supported FB-DIMM Configurations” on page 110
■ “FB-DIMM Device Identifiers” on page 112
■ “FB-DIMM Fault Button Locations” on page 113
■ “Servicing CMP/Memory Modules” on page 98
■ “Performing Node Reconfiguration” on page 155

104 SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Remove FB-DIMMs
Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove a CMP/Memory Module” on page 99

Do the following:

1. If you are removing a faulty FB-DIMM, determine which FB-DIMM you want
to remove.

a. Press the FB-DIMM fault button.


See “FB-DIMM Fault Button Locations” on page 113.

b. Note which FB-DIMM fault LED is illuminated.

2. Push down on the ejector tabs on each side of the FB-DIMM until the
FB-DIMM is released.

Caution – FB-DIMMs might be hot. Use caution when servicing FB-DIMMs.

3. Grasp the top corners of the faulty FB-DIMM and remove it from the
CMP/memory module.

4. Place the FB-DIMM on an antistatic mat.

5. Repeat Step 2 through Step 4 to remove any additional FB-DIMMs.

▼ Install FB-DIMMs
1. Unpackage the replacement FB-DIMMs and place them on an antistatic mat.

Tip – See “Supported FB-DIMM Configurations” on page 110 for information about
configuring the FB-DIMMs.

2. Ensure that the ejector tabs are in the open position.

Servicing Customer-Replaceable Units 105


3. Line up the replacement FB-DIMM with the connector.
Align the FB-DIMM notch with the key in the connector. This ensures that the
FB-DIMM is oriented correctly.

4. Push the FB-DIMM into the connector until the ejector tabs lock the FB-DIMM
in place.
If the FB-DIMM does not easily seat into the connector, verify that the orientation
of the FB-DIMM is correct. If the orientation is reversed, damage to the FB-DIMM
might occur.

5. Repeat Step 2 through Step 4 until all replacement FB-DIMMs are installed.

Next Steps
■ “Install a CMP/Memory Module” on page 100
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

▼ Verify FB-DIMM Replacement


1. Access the ILOM -> prompt.
Refer to the Integrated Lights Out Manager 3.0 Supplement for SPARC Enterprise
T5440 Server for instructions.

2. Run the show faulty command to determine how to clear the fault.
The method you use to clear a fault depends on how the fault is identified by the
showfaults command.
Examples:
■ If the fault is a host-detected fault (displays a UUID), continue to Step 3. For
example:

-> show faulty


Target | Property | Value
--------------------+------------------------+--------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/CPU0/CMP0/BR0/CH1/D0
/SP/faultmgmt/0 | timestamp | Dec 14 22:43:59
/SP/faultmgmt/0/ | sunw-msg-id | SUN4V-8000-DX
faults/0 | |
/SP/faultmgmt/0/ | uuid | 3aa7c854-9667-e176-efe5-e487e520
faults/0 | | 7a8a
/SP/faultmgmt/0/ | timestamp | Dec 14 22:43:59
faults/0 | |

106 SPARC Enterprise T5440 Server Service Manual • July 2009


■ In most cases, if the fault was detected by POST and resulted in the FB-DIMM
being disabled (such as the following example), the replacement of the faulty
FB-DIMM is detected when the service processor is power cycled. In this case,
the fault is automatically cleared from the system.

-> show faulty


Target | Property | Value
--------------------+------------------------+--------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/CPU0/CMP0/BR1/CH0/D0
/SP/faultmgmt/0 | timestamp | Dec 21 16:40:56
/SP/faultmgmt/0/ | timestamp | Dec 21 16:40:56 faults/0
/SP/faultmgmt/0/ | sp_detected_fault | /SYS/MB/CPU0/CMP0/BR1/CH0/D0
faults/0 | | Forced fail(POST)

If the fault is still displayed by the show faulty command, then run the set
command to enable the FB-DIMM and clear the fault.
Example:

-> set /SYS/MB/CPU0/CMP0/BR0/CH0/D0 component_state=Enabled

3. Perform the following steps to verify the repair:

a. Set the virtual keyswitch to diag so that POST will run in Service mode.

-> set /SYS/keyswitch_state=Diag


Set ‘keyswitch_state’ to ‘Diag’

b. Power cycle the system.

-> stop /SYS


Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

Note – The server takes about one minute to power off. Use the show /HOST
command to determine when the host has been powered off. The console will display
status=Powered Off.

Servicing Customer-Replaceable Units 107


c. Switch to the system console to view POST output.

-> start /SYS/console

Watch the POST output for possible fault messages. The following output is a
sign that POST did not detect any faults:

.
.
.
0:0:0>INFO:
0:0:0> POST Passed all devices.
0:0:0>POST: Return to VBSC.
0:0:0>Master set ACK for vbsc runpost command and spin...

Note – Depending on the configuration of ILOM POST variables and whether


POST-detected faults or not, the system might boot, or the system might remain at
the ok prompt. If the system is at the ok prompt, type boot.

d. Return the virtual keyswitch to Normal mode.

-> set /SYS keyswitch_state=Normal


Set ‘ketswitch_state’ to ‘Normal’

e. Switch to the system console and issue the Solaris OS fmadm faulty
command.

# fmadm faulty

No memory faults should be displayed.


If faults are reported, refer to the diagnostics flowchart in FIGURE: Diagnostic
Flowchart on page 11 for an approach to diagnose the fault.

4. Switch to the ILOM command shell.

5. Run the show faulty command.


■ If the fault was detected by the host and the fault information persists, the
output will be similar to the following example:

-> show faulty


Target | Property | Value
--------------------+------------------------+-------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/CPU0/CMP0/BR0/CH1/D0

108 SPARC Enterprise T5440 Server Service Manual • July 2009


/SP/faultmgmt/0 | timestamp | Dec 14 22:43:59
/SP/faultmgmt/0/ | sunw-msg-id | SUN4V-8000-DX
faults/0 | |
/SP/faultmgmt/0/ | uuid | 3aa7c854-9667-e176-efe5-e487e520
faults/0 | | 7a8a
/SP/faultmgmt/0/ | timestamp | Dec 14 22:43:59
faults/0 | |

■ If the show faulty command does not report a fault with a UUID, then you
do not need to proceed with the following step because the fault is cleared.

6. Run the set command.

-> set /SYS/MB/CPU0/CMP0/BR0/CH1/D0 clear_fault_action=True


Are you sure you want to clear /SYS/MB/CPU0/CMP0/BR0/CH1/D0 (y/n)? y
Set ’clear_fault_action’ to ’true

▼ Add FB-DIMMs
If you are upgrading the system with additional FB-DIMMs, use this procedure.

Before you begin, complete these tasks:


■ Read the section, “Safety Information” on page 59.
■ Read the sections, “Supported FB-DIMM Configurations” on page 110 and
“FB-DIMM Device Identifiers” on page 112.
■ Power off the server using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove a CMP/Memory Module” on page 99

1. Unpackage the FB-DIMMs and place them on an antistatic mat.

2. Ensure that the ejector tabs are in the open position.

3. Line up the FB-DIMM with the connector.


Align the FB-DIMM notch with the key in the connector. This ensures that the
FB-DIMM is oriented correctly.

Servicing Customer-Replaceable Units 109


4. Push the FB-DIMM into the connector until the ejector tabs lock the FB-DIMM
in place.
If the FB-DIMM does not easily seat into the connector, verify that the orientation
of the FB-DIMM is correct. If the orientation is reversed, damage to the FB-DIMM
might occur.

5. Repeat Step 2 through Step 4 until all the FB-DIMMs are installed.

Next Steps
■ “Install a CMP/Memory Module” on page 100
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

Supported FB-DIMM Configurations


Use these FB-DIMM configuration rules to help you plan the memory configuration
of your server:
■ Each CMP/memory module pair holds 16 industry-standard FB-DIMMs.
■ 4 FB-DIMM slots are located on the CMP module
■ 12 FB-DIMM slots are located on the memory module.
■ All FB-DIMMs in the system must be the same density (same capacity).
■ At minimum, Channel 0, FB-DIMM Slot 0 in all branches must be populated.
■ In branches populated with more than one FB-DIMM (for example, in 8 and 16
FB-DIMM configurations), FB-DIMMs are addressed in pairs. Each pair must be
identical (same part number).
■ A replacement FB-DIMM must have the same part number as the other FB-DIMM
in its pair. For example, a replacement FB-DIMM in J1201 must have the same part
number as the FB-DIMM in J1401, in order to ensure an identical pair.
■ If you are unable to obtain a matching FB-DIMM, you must replace both
FB-DIMMs in the pair.

Each CMP/memory module pair supports the following configurations:


■ 4 FB-DIMMs (Group 1)
■ 8 FB-DIMMs (Groups 1 and 2)
■ 16 FB-DIMMs (Groups 1, 2, and 3) (fully populated configuration)

110 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Supported FB-DIMM Configurations

Figure Legend

1 Configuration 1: 4 FB-DIMMs (4 on CMP Module Only)


2 Configuration 2: 8 FB-DIMMs (4 on CMP Module, 4 on Memory Module)
3 Configuration 3: 16 FB-DIMMs (4 on CMP Module, 12 on Memory Module)

Note – See TABLE: FB-DIMM Configurations and Device Identifiers on page 112 for
a list of device identifiers and the corresponding slots on the CMP/memory modules.

Related Information
■ “Managing Faults” on page 9
■ “FB-DIMM Device Identifiers” on page 112
■ “FB-DIMM Fault Button Locations” on page 113
■ “Performing Node Reconfiguration” on page 155

Servicing Customer-Replaceable Units 111


FB-DIMM Device Identifiers
TABLE: FB-DIMM Configurations and Device Identifiers on page 112 describes
device and device identifiers for FB-DIMMs on a CMP and memory module pair.
Device identifiers are case-sensitive.

The FB-DIMM address follows the same convention as the CMP or memory module
upon which it is mounted. For example, /SYS/MB/CPU0/CMP0/BR1/CH0/D0 is the
device identifier for the FB-DIMM mounted at J792 on CMP module 0.

TABLE: FB-DIMM Configurations and Device Identifiers

Connector
Location FB-DIMM Device Identifiers Number FB-DIMM Group

CMP module /SYS/MB/CPUx/CMPx/BR1/CH0/D0 J792 Group 1*


/SYS/MB/CPUx/CMPx/BR1/CH1/D0 J896 (4 FB-DIMMs)
/SYS/MB/CPUx/CMPx/BR0/CH0/D0 J585
/SYS/MB/CPUx/CMPx/BR0/CH1/D0 J687
Motherboard connector
Memory module /SYS/MB/MEMx/CMPx/BR1/CH1/D2 J1471 Group 2
/SYS/MB/MEMx/CMPx/BR1/CH1/D3 J1573 (4 FB-DIMMs)
/SYS/MB/MEMx/CMPx/BR1/CH0/D2 J1066
/SYS/MB/MEMx/CMPx/BR1/CH0/D3 J1167
/SYS/MB/MEMx/CMPx/BR0/CH1/D2 J847 Group 3
/SYS/MB/MEMx/CMPx/BR0/CH1/D3 J948 (8 FB-DIMMs)
/SYS/MB/MEMx/CMPx/BR0/CH0/D2 J660
/SYS/MB/MEMx/CMPx/BR0/CH0/D3 J762
/SYS/MB/MEMx/CMPx/BR0/CH1/D1 J746
/SYS/MB/MEMx/CMPx/BR0/CH0/D1 J511
/SYS/MB/MEMx/CMPx/BR1/CH0/D1 J927
/SYS/MB/MEMx/CMPx/BR1/CH1/D1 J1344
Motherboard connector
* Minimum configuration.

Related Information
■ “Managing Faults” on page 9
■ “Supported FB-DIMM Configurations” on page 110
■ “FB-DIMM Fault Button Locations” on page 113
■ “Performing Node Reconfiguration” on page 155

112 SPARC Enterprise T5440 Server Service Manual • July 2009


FB-DIMM Fault Button Locations
FIGURE: FB-DIMM Fault Button Locations on page 114 shows the location of the
FB-DIMM fault buttons on the CMP module and the memory module. Press this
button to illuminate the fault indicator on the module. Replace the FB-DIMM
identified by the indicator.

Note – You must replace a faulty FB-DIMM with an identical part (same part
number). See “Supported FB-DIMM Configurations” on page 110 for more
information.

Servicing Customer-Replaceable Units 113


FIGURE: FB-DIMM Fault Button Locations

Related Information
■ “Managing Faults” on page 9
■ “Supported FB-DIMM Configurations” on page 110
■ “FB-DIMM Device Identifiers” on page 112

114 SPARC Enterprise T5440 Server Service Manual • July 2009


Servicing Field-Replaceable Units

These topics describe how to service field-replaceable units (FRUs) in the SPARC
Enterprise T5440 server.

Note – The procedures in this chapter must be performed by a qualified service


technician.

Topic Links

Remove and install field-replaceable “Servicing the Front Bezel” on page 115
components. “Servicing the DVD-ROM Drive” on page 118
“Servicing the Service Processor” on page 120
“Servicing the IDPROM” on page 123
“Servicing the Battery” on page 125
“Servicing the Power Distribution Board” on
page 126
“Servicing the Fan Tray Carriage” on page 129
“Servicing the Hard Drive Backplane” on page 132
“Servicing the Motherboard” on page 135
“Servicing the Flex Cable Assembly” on page 140
“Servicing the Front Control Panel” on page 144
“Servicing the Front I/O Board” on page 146
Exploded views of FRUs “Field-Replaceable Units” on page 176

Servicing the Front Bezel


You must remove the front bezel in order to service the DVD-ROM drive.

Related Information
■ “Servicing the DVD-ROM Drive” on page 118

115
▼ Remove the Front Bezel
Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ If you are performing additional service procedures, power off the server, using
one of the methods described in the section, “Powering Off the System” on
page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69

Do the following :

1. Grasp the front bezel on the left and right sides.

2. Pull the bezel off of the front of the chassis.


The bezel is secured with three snap-in posts.

116 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Removing the Front Bezel

Note – Avoid bending the bezel by gradually pulling it from the middle and both
ends simultaneously.

▼ Install the Front Bezel


1. Align the bezel with the chassis front panel.

2. Press the bezel onto the front panel.


The bezel is oriented with four guide pins, and is secured with three snap-in posts.

Next Steps
■ “Slide the Server Into the Rack” on page 151

Servicing Field-Replaceable Units 117


■ If you performed any additional service procedures, see “Power On the Server” on
page 153.

Servicing the DVD-ROM Drive


You must remove the front bezel before servicing the DVD-ROM drive.

Related Information
■ “Servicing the Front Bezel” on page 115

▼ Remove the DVD-ROM Drive


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove the Front Bezel” on page 116

Do the following:

1. Remove the flex cable retainer.


Loosen the captive No. 2 Phillips screw and lift the retainer up and out of the
chassis.

2. Unplug the DVD-ROM drive from the flex cable assembly.

3. Push the DVD-ROM drive forward until it protrudes from the front of the
chassis.

118 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Removing the DVD-ROM Drive

4. Slide the DVD-ROM drive out of the chassis.

▼ Install the DVD-ROM Drive


1. Slide the DVD-ROM drive into its bay.

FIGURE: Installing the DVD-ROM Drive

2. Connect the DVD-ROM drive to the flex cable assembly.

3. Install the flex cable retainer.


Place the retainer into position and tighten the captive No. 2 Phillips screw.

Servicing Field-Replaceable Units 119


Next Steps
■ “Install the Front Bezel” on page 117
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

Servicing the Service Processor


The service processor module contains the service processor firmware, IDPROM, and
system battery.

Related Information
■ “Servicing the IDPROM” on page 123
■ “Servicing the Battery” on page 125

▼ Remove the Service Processor


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Disconnect Power Cords From the Server” on page 65
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69

Do the following:

1. Ensure that the power cords are disconnected from the server.

2. Loosen the two captive No. 2 Phillips screws securing the service processor to
the motherboard.

120 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Removing the Service Processor

3. Lift the service processor up and out of the system.

4. Place the service processor on an antistatic mat.

Next Steps
If you are replacing a faulty service processor, you must install the IDPROM onto the
new service processor. Do the following:
■ Remove the IDPROM from the old service processor. See “Remove the IDPROM”
on page 123.

Servicing Field-Replaceable Units 121


■ Install the IDPROM onto the new service processor. See “Install the IDPROM” on
page 124.

▼ Install the Service Processor


1. Ensure that the power cords are disconnected from the system.

2. Lower the service processor into position.


Ensure that the service processor is oriented correctly over the motherboard
connector and the two snap-on standoffs.

FIGURE: Installing the Service Processor

3. Press down evenly to plug the service processor into the motherboard.

4. Secure the service processor with the two captive No.2 Phillips screws.

122 SPARC Enterprise T5440 Server Service Manual • July 2009


Next Steps
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Connect the Power Cords to the Server” on page 153
■ “Power On the Server” on page 153

Servicing the IDPROM


The IDPROM stores system parameters, such as host ID and MAC address, ILOM
configuration settings, and OpenBoot PROM configuration settings. If you are
replacing a faulty service processor, you must move the IDPROM from the old
service processor to the new one.

Related Information
■ “Servicing the Service Processor” on page 120
■ “Servicing the Battery” on page 125

▼ Remove the IDPROM


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Disconnect Power Cords From the Server” on page 65
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove the Service Processor” on page 120

1. Lift the IDPROM up, off its connector on the service processor.

Servicing Field-Replaceable Units 123


FIGURE: Removing the IDPROM

2. Place the IDPROM on an antistatic mat.

▼ Install the IDPROM


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Disconnect Power Cords From the Server” on page 65
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove the Service Processor” on page 120

124 SPARC Enterprise T5440 Server Service Manual • July 2009


● Plug the IDPROM into its connector on the service processor.
Ensure that the service processor is oriented correctly. A notch on the IDPROM
corresponds to a similar notch on the connector.

Servicing the Battery


The battery provides the power necessary to maintain system configuration
parameters during power outages, or while the system is being serviced, stored or
relocated.

Related Information
■ “Servicing the Service Processor” on page 120
■ “Servicing the IDPROM” on page 123

▼ Remove the Battery


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Disconnect Power Cords From the Server” on page 65
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove the Service Processor” on page 120

1. Release the latch securing the battery to its holder on the service processor
board.

2. Lift the battery up and off the board.

▼ Install the Battery


1. Place the battery into its holder on the service processor board.
Ensure that the battery is oriented correctly.

Servicing Field-Replaceable Units 125


2. Press the battery firmly until it snaps into place.

Next Steps
■ “Install the Service Processor” on page 122
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Connect the Power Cords to the Server” on page 153
■ “Power On the Server” on page 153

Servicing the Power Distribution Board


Main 12V power is connected to the motherboard through a bus bar. Standby power
and other control signals are routed through the flex cable circuit to the motherboard.

Related Information
■ “Safety Information” on page 59
■ “Servicing Power Supplies” on page 85

▼ Remove the Power Distribution Board


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Disconnect Power Cords From the Server” on page 65
■ “Extend the Server to the Maintenance Position” on page 66
■ “Remove a Power Supply” on page 89

Note – You must remove all four power supplies from the system.

■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69


■ “Remove the Top Cover” on page 69

Do the following:

126 SPARC Enterprise T5440 Server Service Manual • July 2009


1. Remove the flex cable retainer.
Loosen the captive No. 2 Phillips screw and lift the retainer up and out of the
chassis.

2. Unplug the flex cable from the power distribution board.

3. Unplug the auxiliary power cable from the power distribution board.

4. Remove the No. 2 Phillips screw.

5. Remove the two 7 mm hex nuts securing the bus bars to the power distribution
board.

FIGURE: Disconnecting the Power Distribution Board From the Chassis

6. Slide the power distribution board up and out of the chassis.

Servicing Field-Replaceable Units 127


▼ Install the Power Distribution Board
1. Align the keyholes in the power distribution board with the corresponding
mushroom standoffs in the chassis.

2. Lower the power distribution board into the chassis.

FIGURE: Installing the Power Distribution Board

3. Install the No. 2 Phillips screw.

4. Install the two 7 mm nuts securing the bus bars to the power distribution board.

5. Plug in the flex cable connector.


Ensure that the auxilliary power cable is routed under the flex cable connector.

6. Plug in the auxiliary power cable.

7. Install the flex cable retainer.


Place the retainer into position and tighten the captive No.2 Phillips screw.

Next Steps
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Install a Power Supply” on page 90

Note – Install all four power supplies.

■ “Connect the Power Cords to the Server” on page 153


■ “Power On the Server” on page 153

128 SPARC Enterprise T5440 Server Service Manual • July 2009


Servicing the Fan Tray Carriage
You must remove the fan tray carriage in order to service the following components:
■ Hard disk backplane
■ Motherboard
■ Front control panel
■ Front I/O board

Related Information
■ “Servicing Fan Trays” on page 81
■ “Servicing the Hard Drive Backplane” on page 132
■ “Servicing the Motherboard” on page 135
■ “Servicing the Front Control Panel” on page 144
■ “Servicing the Front I/O Board” on page 146

▼ Remove the Fan Tray Carriage


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove a Fan Tray” on page 83

Note – You must remove all four fan trays.

■ “Remove the Top Cover” on page 69


■ “Remove a CMP/Memory Module” on page 99

Note – You must remove all CPU modules and memory modules from the system.

Do the following:

Servicing Field-Replaceable Units 129


1. Remove the nine No. 1 Phillips screws securing the fan tray carriage to the top
of the chassis.

FIGURE: Removing the Fan Tray Carriage

2. Loosen the seven captive No. 2 Phillips securing the bottom of the fan tray
carriage to the motherboard assembly.

3. Lift the fan tray carriage up and out of the system.

▼ Install the Fan Tray Carriage


1. Lower the fan tray carriage into the system.

130 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Installing the Fan Tray Carriage

2. Secure the seven captive No. 2 Phillips screws

3. Install the nine No. 1Phillips screws.

Next Steps
■ “Install a Fan Tray” on page 84

Note – Install all four fan trays.

■ “Install the Top Cover” on page 150


■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

Servicing the Hard Drive Backplane


The hard drive backplane provides the power and data interconnect to the internal
hard drives.

Related Information
■ “Servicing Hard Drives” on page 72

Servicing Field-Replaceable Units 131


■ “Servicing the Fan Tray Carriage” on page 129

▼ Remove the Hard Drive Backplane


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove a Hard Drive” on page 77

Note – You must remove all four hard drives from the server. Note the location of
each hard drive you remove. You must re-install each hard drive in the correct bay.

■ “Remove a Fan Tray” on page 83

Note – You must remove all four fan trays.

■ “Remove the Fan Tray Carriage” on page 129

Do the following:

1. Remove the flex cable retainer.


Loosen the captive No.2 Phillips screw and lift the retainer up and out of the
chassis.

2. Unplug the cable from the hard drive backplane.

3. Loosen the three captive No. 2 Phillips screws.

132 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Removing the Hard Drive Backplane

4. Lift the backplane up and out of the system.

▼ Install the Hard Drive Backplane


1. Lower the hard drive backplane into the system.
Align the tab on the lower edge the backplane with the corresponding slot in the
chassis floor.

Servicing Field-Replaceable Units 133


FIGURE: Installing the Hard Drive Backplane

2. Tighten the three captive No. 2 Phillips screws.

3. Plug the cable into its connector on the backplane.

4. Install the flex cable retainer.


Place the retainer into position and tighten the captive No.2 Phillips screw.

Next Steps
■ “Install the Fan Tray Carriage” on page 131
■ “Install a Fan Tray” on page 84
■ “Install a CMP/Memory Module” on page 100
■ “Install the Top Cover” on page 150
■ “Install a Hard Drive” on page 78

Note – You must install the hard drives in the correct slots.

■ “Slide the Server Into the Rack” on page 151


■ “Power On the Server” on page 153

134 SPARC Enterprise T5440 Server Service Manual • July 2009


Servicing the Motherboard
Note – If you are replacing faulty motherboard, you must set diag_mode to normal
or off before performing this procedure.

Related Information
■ “Controlling How POST Runs” on page 26
■ “Servicing CMP/Memory Modules” on page 98
■ “Servicing PCIe Cards” on page 92
■ “Servicing the Service Processor” on page 120
■ “Servicing the Fan Tray Carriage” on page 129
■ “Motherboard Fastener Locations” on page 139

▼ Remove the Motherboard


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Disconnect Power Cords From the Server” on page 65
■ “Remove the Server From the Rack” on page 67
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove a PCIe Card” on page 93

Note – You must remove all PCIe cards. Note the location of all PCIe cards so you
can install them in the correct slots during reassembly.

■ “Remove the Service Processor” on page 120


■ “Remove a CMP/Memory Module” on page 99

Note – You must remove all CMP and memory modules.

Servicing Field-Replaceable Units 135


■ “Remove a Fan Tray” on page 83

Note – You must remove all four fan trays.

■ “Remove the Fan Tray Carriage” on page 129

1. Remove the CMP/memory module bracket.


The bracket is secured with six captive No. 2 Phillips screws. See FIGURE:
CMP/Memory Module Bracket Captive Screw Locations on page 136.

FIGURE: CMP/Memory Module Bracket Captive Screw Locations

2. Remove the flex cable retainer.


Loosen the captive No.2 Phillips screw and lift the retainer up and out of the
chassis.

3. Unplug the flex cable from J9801 on the motherboard.

4. Unplug the auxiliary power cable from J9803 on the motherboard.

136 SPARC Enterprise T5440 Server Service Manual • July 2009


5. Unplug the front I/O connector from J9901 on the motherboard.

6. Remove the six No. 2 Phillips screws that secure the bus bar assembly to the
motherboard.

7. Slide the chassis midwall panel up.

Note – Use the clips to secure the midwall panel in the open position.

8. Loosen the No. 2 Phillips screws that secure the motherboard to the chassis
floor.
See FIGURE: Motherboard Fastener Locations on page 140 for the fastener
locations.

9. Lift the motherboard up and out of the chassis.


Guide the flex cable connector out from under the midwall partition.

FIGURE: Removing the Motherboard

10. Place the motherboard on an antistatic mat.

Next Steps
If you are replacing a faulty motherboard, you must program the chassis serial
number and product part number into the new motherboard. See your service
representative.

▼ Install the Motherboard


1. Ensure that all 14 captive screws in the motherboard are retracted.

Servicing Field-Replaceable Units 137


2. Lower the motherboard down into the chassis.
Guide the flex cable connector through the midwall partition.

FIGURE: Installing the Motherboard

3. Secure the No. 2 captive Phillips screws.


Ensure that all fasteners are secured. (See FIGURE: Motherboard Fastener
Locations on page 140.)

4. Lower and secure the midwall partition.

5. Install the six No. 2 Phillips screws that secure the bus bar assembly to the
motherboard.

6. Install the CMP/memory module bracket.


The bracket is secured with six No. 2 Phillips screws.

7. Plug in the auxiliary power cable to J9803.

8. Plug in the flex cable connector to J9801.

9. Install the flex cable retainer.


Place the retainer into position and tighten the captive No. 2 Phillips screw.

10. Plug in the front I/O cable to J9901.

138 SPARC Enterprise T5440 Server Service Manual • July 2009


Next Steps
■ “Install the Fan Tray Carriage” on page 131
■ “Install a Fan Tray” on page 84

Note – Install all four fan trays.

■ “Install a CMP/Memory Module” on page 100

Note – Install all CMP and memory modules.

■ “Install the Service Processor” on page 122


■ “Install a PCIe Card” on page 94
■ “Install the Top Cover” on page 150
■ “Install the Server Into the Rack” on page 150
■ “Connect the Power Cords to the Server” on page 153
■ “Power On the Server” on page 153

Motherboard Fastener Locations


FIGURE: Motherboard Fastener Locations on page 140 shows the location of the
captive screws that secure the motherboard to the chassis floor.

Servicing Field-Replaceable Units 139


FIGURE: Motherboard Fastener Locations

Related Information
■ “Servicing the Motherboard” on page 135

Servicing the Flex Cable Assembly


The flex cable assembly provides the power and data connection between the power
supply backplane, hard drive backplane, and motherboard.

Related Information
■ “Safety Information” on page 59
■ “Servicing Power Supplies” on page 85

140 SPARC Enterprise T5440 Server Service Manual • July 2009


■ “Servicing the Power Distribution Board” on page 126
■ “Servicing the Hard Drive Backplane” on page 132
■ “Servicing the Motherboard” on page 135

▼ Remove the Flex Cable Assembly


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Extend the Server to the Maintenance Position” on page 66
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69

Do the following:

1. Unplug the power cords.

2. Remove the flex cable retainer.


Loosen the captive No.2 Phillips screw and lift the retainer up and out of the
chassis.

Servicing Field-Replaceable Units 141


FIGURE: Removing the Flex Cable Retainer

3. Unplug the flex cable-to-power supply backplane connection.

4. Unplug the flex cable-to-hard drive backplane connection.

5. Unplug the flex cable-to-DVD-ROM drive connection.

6. Unplug the flex cable-to-motherboard connection.

7. Lift the flex cable up and out of the system.

▼ Install the Flex Cable Assembly


1. Ensure the power cables are unplugged.

2. Plug in the motherboard connector.

3. Plug in the hard drive backplane connector.

4. Plug in the DVD-ROM drive connector.

5. Plug in the power supply backplane connector.

142 SPARC Enterprise T5440 Server Service Manual • July 2009


6. Install the flex cable retainer.
Place the retainer into position and tighten the captive No. 2 Phillips screw.

FIGURE: Installing the Flex Cable Retainer

7. Plug in the power cables.

Next Steps
■ “Install the Top Cover” on page 150
■ “Slide the Server Into the Rack” on page 151
■ “Power On the Server” on page 153

Servicing Field-Replaceable Units 143


Servicing the Front Control Panel
The front control panel contains system status LEDs and the Power button.

Related Information
■ “Infrastructure Boards and Cables” on page 1
■ “Front Panel Diagram” on page 3
■ “Front Panel LEDs” on page 4

▼ Remove the Front Control Panel


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Disconnect Power Cords From the Server” on page 65
■ “Remove the Server From the Rack” on page 67
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove a Fan Tray” on page 83
■ “Remove the Fan Tray Carriage” on page 129

1. Unplug the front control panel cable from J9901 on the motherboard.

2. Unplug the front control panel cable from the front I/O board.

3. Remove the two No. 2 Phillips screws.

144 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Removing the Front Control Panel

4. Lift the front control panel up and out of the system.

5. Place the front control panel on an antistatic mat.

▼ Install the Front Control Panel


1. Lower the front control panel into the system.

Servicing Field-Replaceable Units 145


FIGURE: Installing the Front Control Panel

2. Install the two No. 2 Phillips screws.

3. Plug the front control panel connector into the front I/O board.

4. Plug the front control panel connector into J9901 on the motherboard.

Next Steps
■ “Install the Fan Tray Carriage” on page 131
■ “Install a Fan Tray” on page 84
■ “Install the Top Cover” on page 150
■ “Install the Server Into the Rack” on page 150
■ “Connect the Power Cords to the Server” on page 153
■ “Power On the Server” on page 153

Servicing the Front I/O Board


The front I/O board contains two USB connectors. You must remove the front control
panel to service the front I/O board.

146 SPARC Enterprise T5440 Server Service Manual • July 2009


Related Information
■ “Infrastructure Boards and Cables” on page 1
■ “Front Panel Diagram” on page 3
■ “Servicing the Front Control Panel” on page 144

▼ Remove the Front I/O Board


Before you begin, complete these tasks:
■ Read the section, “Safety Information” on page 59.
■ Power off the server, using one of the methods described in the section, “Powering
Off the System” on page 63.
■ “Disconnect Power Cords From the Server” on page 65
■ “Remove the Server From the Rack” on page 67
■ “Perform Electrostatic Discharge – Antistatic Prevention Measures” on page 69
■ “Remove the Top Cover” on page 69
■ “Remove a Fan Tray” on page 83
■ “Remove the Fan Tray Carriage” on page 129

1. Unplug the front control panel cable from J9901 on the motherboard.

2. Unplug the front control panel cable from the front I/O board.

3. Remove the two No. 2 Phillips screws.

Servicing Field-Replaceable Units 147


FIGURE: Removing the Front I/O Board

4. Lift the front I/O board up and out of the system.

5. Place the front I/O board on an antistatic mat.

▼ Install the Front I/O Board


1. Lower the front I/O board into the system.

2. Install the two No. 2 Phillips screws.

3. Plug the front control panel connector into the front I/O board.

4. Plug the front control panel connector into J9901 on the motherboard.

Next Steps
■ “Install the Fan Tray Carriage” on page 131
■ “Install a Fan Tray” on page 84
■ “Install the Top Cover” on page 150
■ “Install the Server Into the Rack” on page 150
■ “Connect the Power Cords to the Server” on page 153
■ “Power On the Server” on page 153

148 SPARC Enterprise T5440 Server Service Manual • July 2009


Returning the Server to Operation

These topics describe how to return the SPARC Enterprise T5440 and SPARC
Enterprise T5440 servers to operation after you have performed service procedures.

Caution – Never attempt to run the server with the cover removed. Hazardous
voltage is present.

Caution – Equipment damage could occur if you run the server with the cover
removed. The cover must be in place for proper air flow.

Topic Links

Install the top cover after servicing internal “Install the Top Cover” on page 150
components.
Re-attach the server to the cabinet slide rails “Install the Server Into the Rack” on
after performing a bench procedure. page 150
Slide the server back into the equipment “Slide the Server Into the Rack” on page 151
rack.
Re-attach power cords and data cables to the “Connect the Power Cords to the Server” on
back panel of the server. page 153
Power on the server after performing a “Power On the Server” on page 153
service procedure.

Related Information
■ “Preparing to Service the System” on page 59
■ “Servicing Customer-Replaceable Units” on page 71
■ “Servicing Field-Replaceable Units” on page 115

149
▼ Install the Top Cover
If you removed the top cover, perform the steps in this procedure.

Note – If removing the top cover caused an emergency shutdown, you must install
the top cover and use the poweron command to restart the system. See “Power On
the Server” on page 153.

1. Place the top cover on the chassis.


Set the cover down so that it hangs over the rear of the server by about an inch
(25.4 mm).

2. Slide the top cover forward until it seats.

3. Secure the top cover by tightening the two captive screws along the rear edge.

▼ Install the Server Into the Rack


The following procedure explains how to insert the server into the rack.

Caution – The weight of the server on extended slide rails can be enough to
overturn an equipment rack. Before you begin, deploy the antitilt feature on your
cabinet.

Caution – The server weighs approximately 88 lb (40 kg). Two people are required
to lift and mount the server into a rack enclosure when using the procedures in this
chapter.

1. Slide the inner slide assemblies out from the outer rails about 2 inches (5 cm)
from the front face of the rail’s bracket.
The inner slide assemblies should be locked past the internal stop. See FIGURE:
Inserting the System Into the Rack on page 151.
Ensure that the ball bearing retainer is locked all the way forward.

150 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Inserting the System Into the Rack

2. Lift the server up and insert the inner rails into the inner slide assemblies.
Ensure that the inner rails are horizontal when the inner rails enter the inner slide
assemblies.

3. Ensure that the inner rails are engaged with the ball-bearing retainers on both
inner slide assemblies.

Note – If necessary, support the server with the mechanical lift while aligning the
inner rails parallel to the rack-mounted inner slide assemblies.

▼ Slide the Server Into the Rack


1. Press the inner rail release buttons (FIGURE: Slide Rail Release Button Location
on page 152) on both sides of the server.

Returning the Server to Operation 151


FIGURE: Slide Rail Release Button Location

Figure Legend

1 Inner rail release button


2 Slide rail lock

2. While pushing on the release buttons, slowly push the server into the rack.
Ensure that the cables do not get in the way.

3. If necessary, re-attach the CMA.

a. Attach the CMA support strut to the inner glide.

b. Attach the CMA to the inner glide.


Slide the hinge plate into the end of the outer rail until the retaining pin snaps
into place.

4. Reconnect the cables to the back of the server.


If the CMA is in the way, slide the server partially out of the cabinet to access the
necessary rear panel connections.

152 SPARC Enterprise T5440 Server Service Manual • July 2009


▼ Connect the Power Cords to the Server
● Reconnect both power cords to the power supplies.

Note – As soon as the power cords are connected, standby power is applied.
Depending on the configuration of the firmware, the system might boot. See the
SPARC Enterprise T5440 Server Administration Guide for configuration and power-on
information.

▼ Power On the Server


● To power on the server, do one of the following:
■ To initiate the power-on sequence from the service processor prompt, issue the
poweron command.
■ You will see an -> Alert message on the system console. This message
indicates that the system is reset. You will also see a message indicating that the
VCORE has been margined up to the value specified in the default.scr file
that was previously configured.
■ Example:

-> start /SYS

■ To initiate the power-on sequence manually, use a pen or pencil to press the
Power button on the front panel. See “Front Panel Diagram” on page 3 for
Power button location.

Note – If you are powering on the server following an emergency shutdown


triggered by the top cover interlock switch, you must use the poweron command.

Returning the Server to Operation 153


154 SPARC Enterprise T5440 Server Service Manual • July 2009
Performing Node Reconfiguration

If a CMP/memory module pair develops a fault, the SPARC Enterprise T5440 can be
reconfigured to run in a degraded state until the CMP/memory module is replaced.
In addition, you can add CMP/memory module pairs to existing systems. However,
adding or removing CMP/memory modules might affect internal hardware device
addresses, as well as the device address of any external devices attached to the
system such as external disk arrays and devices attached via an External I/O
Expansion Unit.

Depending on which CMP/memory module is added or removed, it might be


necessary to manually reassign one or more I/O devices before they can function
correctly in the new system configuration.

Topic Links

Learn about how CMP/memory modules “I/O Connections to CMP/Memory


map to I/O devices. Modules” on page 156
Learn how to reconfigure the server to “Reconfiguring I/O Device Nodes” on
temporarily bypass a failed CMP/memory page 158
module
Disable memory modules “Temporarily Disable All Memory Modules”
on page 160
Reconfigure I/O and PCIe fabric “Reconfigure the I/O and PCIe Fabric” on
page 158
Re-enable memory modules to work in a “Re-Enable All Memory Modules” on
new I/O and PCIe configuration page 161
Reset logical domain guest configuration “Reset the LDoms Guest Configuration” on
page 162
Reference for system bus topology “System Bus Topology” on page 162
Reference for I/O fabric in supported “I/O Fabric in 2P Configuration” on page 164
configurations “I/O Fabric in 4P Configuration” on page 165

Related Information
■ “Managing Faults” on page 9

155
■ “Servicing PCIe Cards” on page 92
■ “Servicing CMP/Memory Modules” on page 98
■ “Servicing FB-DIMMs” on page 104

I/O Connections to CMP/Memory


Modules
Each PCIe slot and onboard I/O device is connected to one CMP module. Device
address is dependent on system configuration. See TABLE: Devices controlled by
CMPs in 2P systems on page 164 and TABLE: Devices controlled by CMPs in 4P
systems on page 165 for more information.

If a CMP module fails, the onboard devices and slots directly connected to it become
unavailable. Recovery of the I/O services connected to the failed CMP requires I/O
node reconfiguration.

For example, in a 4P system, if CMP0 goes offline, the following devices become
unavailable:
■ PCIe0
■ PCIe1
■ Onboard hard drives

In this failure scenario, the system is unable to boot from internal drives.

Similarly, if CMP1 goes offline, the following devices become unavailable:


■ PCIe4
■ PCIe5
■ Onboard network devices

Related Information
■ “System Bus Topology” on page 162
■ “I/O Fabric in 2P Configuration” on page 164
■ “I/O Fabric in 4P Configuration” on page 165

156 SPARC Enterprise T5440 Server Service Manual • July 2009


Recovering from a Failed CMP/Memory
Module
If your system experiences a complete CMP/memory module failure, do one of the
following:

1. Replace the failed CMP/memory module.

2. If a replacement CMP module is not available, remove the failed CMP module and
replace it with a CMP from a different slot that does not have any directly
connected I/O devices in use (see TABLE: Devices controlled by CMPs in 2P
systems on page 164 andTABLE: Devices controlled by CMPs in 4P systems on
page 165). If this leaves a memory module without its associated CMP module,
remove the memory module.

Note – At a minimum, a functioning CMP module must be installed in CMP Slot 0.


If you are performing a node reconfiguration following a failure in CMP Slot 0, you
must move one of the remaining CMP modules to CMP Slot 0.

3. If neither option (1) nor (2) is possible, you must do the following:
■ “Temporarily Disable All Memory Modules” on page 160
■ “Reconfigure the I/O and PCIe Fabric” on page 158
■ “Re-Enable All Memory Modules” on page 161
■ “Reset the LDoms Guest Configuration” on page 162

Related Information
■ “Managing Faults” on page 9
■ “Servicing CMP/Memory Modules” on page 98
■ “Servicing FB-DIMMs” on page 104
■ “I/O Connections to CMP/Memory Modules” on page 156
■ “Reconfiguring I/O Device Nodes” on page 158
■ “System Bus Topology” on page 162
■ “I/O Fabric in 2P Configuration” on page 164
■ “I/O Fabric in 4P Configuration” on page 165

Performing Node Reconfiguration 157


Reconfiguring I/O Device Nodes
You might need to change the connection between the CMP modules and the
onboard devices described in TABLE: Devices controlled by CMPs in 2P systems on
page 164 or TABLE: Devices controlled by CMPs in 4P systems on page 165 in one of
the following circumstances:
■ A CMP module has completely failed, you need access to a PCIe slot or device
which was attached to that CMP module, and you are unable to temporarily
replace the failed module or move an existing module over from a different slot
until the failed CMP is replaced.
■ You are upgrading from a 2P to a 4P system.

Related Information
■ “Managing Faults” on page 9
■ “I/O Connections to CMP/Memory Modules” on page 156
■ “System Bus Topology” on page 162
■ “I/O Fabric in 2P Configuration” on page 164
■ “I/O Fabric in 4P Configuration” on page 165
■ “Temporarily Disable All Memory Modules” on page 160
■ “Reconfigure the I/O and PCIe Fabric” on page 158
■ “Re-Enable All Memory Modules” on page 161
■ “Reset the LDoms Guest Configuration” on page 162

▼ Reconfigure the I/O and PCIe Fabric


The reconf.pl script reconfigures the PCIe fabric to reconnect the PCIe slots and
onboard devices to the CMP nodes as efficiently as possible. The reconf.pl script
also reconfigures the Solaris device names to match the new connections between the
CMP modules and the PCIe devices and slots. Use the reconf.pl script to reattach
each PCIe slot and onboard device to its nearest available CMP module.

To use the reconf.pl, you must have the following:


■ Solaris OS JumpStart™ server
■ Net install image
■ The reconf.pl script

Do the following:

158 SPARC Enterprise T5440 Server Service Manual • July 2009


1. Download the reconf.pl script, available at:
(https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI
-Site/en_US/-/USD/ViewProductDetail-Start?ProductRef=
IORSC-SP-E54440SV-G-F@CDS-CDS_SMI)

2. Copy the reconf.pl script to the root directory of the “miniroot” of the
netinstall image. This is the Solaris_10/Tools/Boot directory of your
exported Solaris 10 8/07, Solaris 10 5/08, or Solaris 10 10/08 OS image on your
JumpStart server.

3. Power off the system.

4. Log into the ALOM compatibility shell. Type:

sc> setsc sys_ioreconfigure nextboot

5. Power on the system.

6. Boot from the network. Type:

ok boot -s

7. Mount the system boot disk under the /mnt directory. Type:

# mount /dev/dsk/c0t0d0s0 /mnt

8. Change to the root directory of your boot disk and copy the reconf.pl script to
the root of the boot disk. Type:

# cd /mnt

9. Do one of the following:


■ If your Jumpstart server is exporting Solaris 10 8/07 or Solaris 10 5/08, type:

# cp /reconf.pl .

■ If your Jumpstart server is exporting Solaris 10 10/08, type:

# cp /cdrom/Solaris_10/Tools/Boot/reconf.pl .

10. Run the reconf.pl script. Type:

# /mnt/reconf.pl

Performing Node Reconfiguration 159


11. Halt the system. Type:

# halt

12. Power off the system. For example, to power off using the ALOM compatibility
shell, type:

sc> poweroff

Wait for the console message which indicates that the system has been powered
off.

▼ Temporarily Disable All Memory Modules


A disabled CMP node complicates the memory topology and can prevent a system
from booting. To run the system in a degraded state, you must reduce the total
amount of system memory by disabling all of the FB-DIMMs on all of the memory
modules in order to work around this complication.

If you are recovering from a failed CMP module, you must temporarily disable the
FB-DIMMS on all memory modules when Solaris is halted and the system is
powered off. The FB-DIMMs are re-enabled after the I/O and PCIe devices are
reconfigured.

You can either physically remove the memory modules from the system, or remotely
disable all FB-DIMMs located on all memory modules using the
disablecomponent command.

To remove the memory modules from the system, see the instructions in the SPARC
Enterprise T5440 Server Service Manual.

To remotely disable all FB-DIMMs in the system, do the following:

1. Halt the Solaris OS.

2. Power off the system.

160 SPARC Enterprise T5440 Server Service Manual • July 2009


3. Disable each FB-DIMM.

sc> disablecomponent /SYS/MB/MEMx/CMPx/BR0/CH0/D1


sc> disablecomponent /SYS/MB/MEMx/CMPx/BR0/CH0/D2
...
sc> disablecomponent /SYS/MB/MEMx/CMPx/BR1/CH1/D3

where x is the memory module to be disabled.


CODE EXAMPLE: Using the disablecomponent command to disable all
FB-DIMMs on MEM1 on page 161 shows how to disable all the FB-DIMMs on
MEM1.

CODE EXAMPLE: Using the disablecomponent command to disable all FB-DIMMs on


MEM1
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR0/CH0/D1
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR0/CH0/D2
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR0/CH0/D3
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR0/CH1/D1
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR0/CH1/D2
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR0/CH1/D3
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR1/CH0/D1
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR1/CH0/D2
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR1/CH0/D3
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR1/CH1/D1
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR1/CH1/D2
sc> disablecomponent /SYS/MB/MEM1/CMP1/BR1/CH1/D3

▼ Re-Enable All Memory Modules


Now that the connection between the CMP modules and the I/O devices has been
reestablished, you can re-enable the FB-DIMMS that were temporarily disabled in
“Temporarily Disable All Memory Modules” on page 160.

● Do one of the following:


■ Install the memory modules if you removed them.
■ Re-enable all of the FB-DIMMs which you previously disabled, using the
enablecomponent command.

sc> enablecomponent /SYS/MB/MEMx/CMPx/BR0/CH0/D1


sc> enablecomponent /SYS/MB/MEMx/CMPx/BR0/CH0/D2
...
sc> enablecomponent /SYS/MB/MEMx/CMPx/BR1/CH1/D3

where x is the CMP/memory module to be enabled.

Performing Node Reconfiguration 161


CODE EXAMPLE: Using the enablecomponent command to enable all
FB-DIMMs on CMP1 on page 162 shows how to enable all the FB-DIMMs on
MEM1.

CODE EXAMPLE: Using the enablecomponent command to enable all FB-DIMMs on


CMP1
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR0/CH0/D1
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR0/CH0/D2
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR0/CH0/D3
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR0/CH1/D1
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR0/CH1/D2
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR0/CH1/D3
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR1/CH0/D1
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR1/CH0/D2
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR1/CH0/D3
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR1/CH1/D1
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR1/CH1/D2
sc> enablecomponent /SYS/MB/MEM1/CMP1/BR1/CH1/D3

▼ Reset the LDoms Guest Configuration


After reconfiguring the I/O and PCIe fabric, you must recreate your LDoms guest
configurations, as hardware resources that had been previously assigned to your
guests might no longer be available.

1. Power off the system.

2. In the ALOM compatibility shell, type:

sc> bootmode config="factory-default"

3. Power on the system.

4. Recreate your LDoms guests using the remaining hardware resources.

System Bus Topology


FIGURE: System Bus Topology on page 163 describes the system bus topology for the
SPARC Enterprise T5440 server.

162 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: System Bus Topology

Related Information
■ “I/O Fabric in 2P Configuration” on page 164
■ “I/O Fabric in 4P Configuration” on page 165

Performing Node Reconfiguration 163


I/O Fabric in 2P Configuration
TABLE: Devices controlled by CMPs in 2P systems

CMP Number Devices Controlled

CMP0 Onboard disk drives


Onboard USB ports
Onboard DVD drive
PCIe0
PCIe1
PCIe2
PCIe3
CMP1 Onboard Gbit or 10-Gbit network
PCIe4
PCIe5
PCIe6
PCIe7

Related Information
■ “System Bus Topology” on page 162
■ “I/O Fabric in 4P Configuration” on page 165

164 SPARC Enterprise T5440 Server Service Manual • July 2009


I/O Fabric in 4P Configuration
TABLE: Devices controlled by CMPs in 4P systems

CMP Number Devices Controlled

CMP0 Onboard disk drives


Onboard USB ports
Onboard DVD drive
PCIe0
PCIe1
CMP1 Onboard Gbit or 10-Gbit network
PCIe4
PCIe5
CMP2 PCIe2
PCIe3
CMP3 PCIe6
PCIe7

Related Information
■ “System Bus Topology” on page 162
■ “I/O Fabric in 2P Configuration” on page 164

Performing Node Reconfiguration 165


166 SPARC Enterprise T5440 Server Service Manual • July 2009
Connector Pinouts

This section provides reference information about the system back panel ports and
pin assignments.

Topic Links

Reference for system “Serial Management Port Connector Pinouts” on page 167
connector pinouts “Network Management Port Connector Pinouts” on page 168
“Serial Port Connector Pinouts” on page 169
“USB Connector Pinouts” on page 169
“Gigabit Ethernet Connector Pinouts” on page 170

Related Information
■ “Identifying Server Components” on page 1

Serial Management Port Connector


Pinouts
The serial management connector (labeled SERIAL MGT) is an RJ-45 connector
located on the back panel. This port is the default connection to the system console.

167
FIGURE: Serial Management Connector Diagram

TABLE: Serial Management Connector Signals

Pin Signal Description Pin Signal Description

1 Request to Send 5 Ground


2 Data Terminal Ready 6 Receive Data
3 Transmit Data 7 Data Set Ready
4 Ground 8 Clear to Send

Network Management Port Connector


Pinouts
The network management connector (labeled NET MGT) is an RJ-45 connector
located on the motherboard and can be accessed from the back panel. This port
needs to be configured prior to use.

168 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Network Management Connector Diagram

TABLE: Network Management Connector Signals

Pin Signal Description Pin Signal Description

1 Transmit Data + 5 Common Mode Termination


2 Transmit Data – 6 Receive Data –
3 Receive Data + 7 Common Mode Termination
4 Common Mode Termination 8 Common Mode Termination

Serial Port Connector Pinouts


The serial port connector (TTYA) is a DB-9 connector that can be accessed from the
back panel.

FIGURE: Serial Port Connector Diagram

TABLE: Serial Port Connector Signals

Pin Signal Description Pin Signal Description

1 Data Carrier Detect 6 Data Set Ready


2 Receive Data 7 Request to Send
3 Transmit Data 8 Clear to Send
4 Data Terminal Ready 9 Ring Indicate
5 Ground

Connector Pinouts 169


USB Connector Pinouts
Two Universal Serial Bus (USB) ports are located on the motherboard in a
double-stacked layout and can be accessed from the back panel. Two additional USB
ports are located on the front panel.

FIGURE: USB Connector Diagram

1 2 3 4

1 2 3 4

TABLE: USB Connector Signals

Pin Signal Description Pin Signal Description

A1 +5 V (fused) B1 +5 V (fused)
A2 USB0/1- B2 USB2/3-
A3 USB0/1+ B3 USB2/3+
A4 Ground B4 Ground

Gigabit Ethernet Connector Pinouts


Four RJ-45 Gigabit Ethernet connectors (NET0, NET1, NET2, NET3) are located on
the system motherboard and can be accessed from the back panel. The Ethernet
interfaces operate at 10 Mbit/sec, 100 Mbit/sec, and 1000 Mbit/sec.

170 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Gigabit Ethernet Connector Diagram

TABLE: Gigabit Ethernet Connector Signals

Pin Signal Description Pin Signal Description

1 Transmit/Receive Data 0 + 5 Transmit/Receive Data 2 –


2 Transmit/Receive Data 0 – 6 Transmit/Receive Data 1 –
3 Transmit/Receive Data 1 + 7 Transmit/Receive Data 3 +
4 Transmit/Receive Data 2 + 8 Transmit/Receive Data 3 –

Connector Pinouts 171


172 SPARC Enterprise T5440 Server Service Manual • July 2009
Server Components

This section provides illustrations depicting system components.

Description Links

A diagram and list of customer “Customer-Replaceable Units” on page 174


replaceable units (CRUs)
A diagram and list of components “Field-Replaceable Units” on page 176
that only field service personnel can
replace.

Related Information
■ “Identifying Server Components” on page 1
■ “Servicing Customer-Replaceable Units” on page 71
■ “Servicing Field-Replaceable Units” on page 115

173
Customer-Replaceable Units
FIGURE: Customer-Replaceable Units (CRUs)

Figure Legend

1 CMP modules 5 Front bezel


2 Memory modules 6 Hard drives
3 Fan trays 7 Power supplies
4 Removable media drive 8

174 SPARC Enterprise T5440 Server Service Manual • July 2009


Related Information
■ “Hot-Pluggable and Hot-Swappable Devices” on page 72
■ “Servicing Hard Drives” on page 72
■ “Servicing Fan Trays” on page 81
■ “Servicing Power Supplies” on page 85
■ “Servicing CMP/Memory Modules” on page 98
■ “Servicing FB-DIMMs” on page 104
■ “Servicing the Front Bezel” on page 115
■ “Servicing the DVD-ROM Drive” on page 118

Server Components 175


Field-Replaceable Units
FIGURE: Field-Replaceable Units (FRUs)

Figure Legend

1 CMP/memory module bracket 4 Power supply backplane


2 Fan cage 5 Flex cable assembly
3 Hard drive backplane 6 Auxiliary power cable

176 SPARC Enterprise T5440 Server Service Manual • July 2009


FIGURE: Field Replaceable Units (FRUs) (Motherboard and Auxiliary Boards)

Figure Legend

1 IDPROM 4 Motherboard
2 Front Control Panel 5 Battery
3 Front I/O Board 6 Service Processor

Related Information
■ “Servicing the Service Processor” on page 120
■ “Servicing the IDPROM” on page 123
■ “Servicing the Battery” on page 125
■ “Servicing the Power Distribution Board” on page 126

Server Components 177


■ “Servicing the Fan Tray Carriage” on page 129
■ “Servicing the Hard Drive Backplane” on page 132
■ “Servicing the Motherboard” on page 135
■ “Servicing the Flex Cable Assembly” on page 140
■ “Servicing the Front Control Panel” on page 144
■ “Servicing the Front I/O Board” on page 146

178 SPARC Enterprise T5440 Server Service Manual • July 2009


Index

Numerics clearing PSH-detected faults, 49


3.3V standby (power supply rail), 2 CMP
fault recovery, 160
A CMP module
AC Present (power supply LED), 12, 92 disabling to run system in degraded state, 160
adding failure recovery, 157
CMP/memory module, 101 fault recovery, 155
FB-DIMMs, 109 I/O devices connected to, 156
PCIe card, 94 CMP/memory module, 100
addresses, device adding, 101
and system configuration, 156 device identifiers, 103
installing, 100
advanced ECC technology, 21
removing, 99
Advanced Lights Out Management (ALOM) CMT supported configurations, 104
connecting to, 22
CMP/memory modules
airflow, blocked, 13 supported configurations, 104
antistatic wrist strap, 61 CMP0 failure mode, 156
ASR blacklist, 51, 52 CMP1 failure mode, 156
asrkeys (system components), 24 command
Automatic System Recovery (ASR), 50 cfgadm, 74, 76, 77
disablecomponent, 52
B fmdump, 45
battery iostat -E, 77
installing, 125 removefru, 55
removing, 125 setlocator, 4, 7, 55, 66
blacklist, ASR, 51 show faulty, 32, 108
bootmode command, 54 showfaults, 55
break command, 53 showfru, 25, 56
component_state (ILOM component
C property), 48
cfgadm command, 74, 76, 77 components
disabled automatically by POST, 51
chassis
disabling using disablecomponent
dimensions, 1
command, 52
serial number, 62
displaying state of, 51
clearfault command, 53 displaying using showcomponent
clearing POST-detected faults, 48 command, 24

179
configuration External I/O Expansion Unit
device addresses, 156 fault detected by show faulty command, 35
connecting to ALOM CMT, 22 faults detection in, 15
console command, 29, 53, 108
consolehistory command, 54 F
Fan Fault (system LED)
D interpreting to diagnose faults, 31
DC OK (power supply LED), 91 fan module
determining fault state, 31
device identifiers
Fault LED, 31
CMP/memory modules, 103
fan tray, 84 fan module LEDs
FB-DIMMs, 112 using to identify faults, 31
hard drive, 79 fan tray, 83
PCIe card, 96 device identifiers, 84
power supply, 91 installing, 82, 84
diag_level parameter, 27, 56 removing, 81, 83
diag_mode parameter, 26, 56 fan tray carriage
installing, 131
diag_trigger parameter, 27, 57
removing, 129
diag_verbosity parameter, 27, 57
fan tray LEDs
diagnostics
about, 84
about, 9
fan trays
flowchart, 11
about, 81
low level, 19
running remotely, 15 Fault (hard drive LED), 31
using SunVTS, 18 Fault (power supply LED), 86, 92
disablecomponent command, 52 fault manager daemon, fmd(1M), 18
displaying FRU status, 25 fault records, 49
dmesg command, 36 fault recovery
DVD-ROM drive I/O device, 158
installing, 119 fault recovery, CMP module, 155
removing, 118 faults
clearing POST-detected faults, 48
E detected by POST, 12, 33, 35
ejector tabs, FB-DIMM, 105 detected by PSH, 12, 34
electrostatic discharge (ESD) diagnosing with LEDs, 30 to 32
preventing, 69 environmental, 12, 13, 33
preventing using an antistatic mat, 61 environmental, displayed by show faulty
preventing using an antistatic wrist strap, 61 command, 34
safety measures, 61 FB-DIMM, 106
forwarded to ILOM, 15
emergency shutdown, 65
recovery, 16
using Power button, 5
repair, 16
enablecomponent command, 48 types of, 33
environmental faults, 12, 13, 16, 33 FB-DIMM fault button, 113
event log, checking the PSH, 45 FB-DIMM Fault LEDs, 32
EVENT_ID, FRU, 45 FB-DIMMS
exercising the system with SunVTS, 38

180 SPARC Enterprise T5440 Server Service Manual • July 2009


disabling to run system in degraded state, 160 determining fault state, 31
FB-DIMMs device identifiers, 79
adding, 109 Fault LED, 31
device identifiers, 112 hot-plugging, 75
diagnosing with fault button, 113 installing, 75, 78
diagnosing with show faulty command, 106 Ready-to-Remove LED, 76
ejector tabs, 105 removing, 73, 77
example POST error output, 42 hard drive backplane, 132
fault handling, 21 about, 2
installing, 105 installing, 133
managing faults in, 106 removing, 132
re-enabling to run system in degraded state, 161 hard drive LEDs, 80
removing, 105 hard drive LEDs, about, 80
supported configurations, 110
help command, 53
troubleshooting, 22
host ID, stored on SCC module, 2
verifying successful replacement, 106
hot-pluggable devices, 72
flex cable assembly
installing, 142 hot-plugging
removing, 141 hard drive, 73, 75
hard drive, situations inhibiting, 73
fmadm command, 49, 108
hot-swappable devices, 72
fmdump command, 45
hot-swapping
front bezel
fan tray, 81, 82
installing, 117
power supply, 86
removing, 116
front control panel
installing, 145
I
removing, 144 I/O connections to CMP module, 156
front I/O board I/O fabric
installing, 148 in 2-processor configuration, 164
removing, 147 in 4-processor configuration, 165
front panel diagram, 3 I/O subsystem, 18, 19, 51
front panel LEDs, 4 IDPROM
installing, 124
FRU event ID, 45
removing, 123
FRU ID PROMs, 15
ILOM commands
FRU information show, 25
displaying with show command, 25 show faulty, 33, 43, 55, 108
FRU status, displaying, 25 ILOM see Integrated Lights Out Management
(ILOM)
G ILOM system event log, 12
Gigabit Ethernet ports indicators, 30
LEDs, 8
infrastructure boards, about, 1
pinouts, 170
See also power distribution board, power supply
backplane, hard drive backplane, front I/O
H board, front control panel
hard drive
installing, 100
about, 72
battery, 125
addressing, 75, 78

Index 181
CMP/memory module, 100 hard drive, 80
DVD-ROM drive, 119 network management port, 8
fan tray, 82, 84 rear panel, 7
fan tray carriage, 131 Service Required (system LED), 32
FB-DIMMs, 105 using to diagnose faults, 30
flex cable assembly, 142 using to identify device state, 30
front bezel, 117 Locator LED and button, 3, 4, 5, 7
front control panel, 145 log files, viewing, 36
front I/O board, 148 logical domains
hard drive, 75, 78 guest configuration, 162
hard drive backplane, 133
IDPROM, 124
motherboard, 138
M
PCIe card, 94 MAC addresses, stored on SCC module, 2
power distribution board, 128 maintenance position, 65, 67
power supply, 87, 90 memory
service processor, 122 also see FB-DIMMs
top cover, 150 fault handling, 21
Integrated Lights Out Manager memory modules
and fault detection in External I/O Expansion see CMP/memory modules
Unit, 15 message ID, 18
iostat -E command, 77 messages file, 35
motherboard
L about, 1
latch fastener locations, 139
power supply, 86, 89 installing, 138
slide rail, 66 removing, 135
LED
AC Present (power supply LED), 12, 92 N
DC OK (power supply LED), 91 network management port
Fan Fault (system LED), 31 LEDs, 8
Fault (fan module LED), 31 pinouts, 168
Fault (hard drive LED), 31 node reconfiguration, 155
Fault (power supply LED), 31, 86, 92 and I/O services, 156
FB-DIMM Fault (motherboard LEDs), 32 I/O device nodes, 158
Gigabit Ethernet port, 8 PCIe, 158
Locator, 4, 7 Normal mode (virtual keyswitch position), 108
Overtemp (system LED), 5, 31 also see setkeyswitch command.
Power OK (system LED), 12
Power Supply Fault (system LED), 5, 31, 88, 92
O
Ready-to-Remove (hard drive LED), 74, 76
Overtemp (system LED), 5, 31
Service Required (system LED), 4, 31, 92
Top (system LED), 5 overtemperature condition, 31
LEDs
about, 30 P
fan module, 31 PCIe card
fan tray, 84 adding, 94
front panel, 4 configuration guidelines, 97

182 SPARC Enterprise T5440 Server Service Manual • July 2009


device identifiers, 96 components disabled by, 51
installing, 94 configuration flowchart, 20
removing, 93 controlling output, 26
PCIe fabric reconfiguration, 158 error messages, 42
pinouts fault clearing, 48
Gigabit Ethernet ports, 170 faults detected by, 12, 33
network management port, 168 faulty components detected by, 48
serial management port, 167 parameters, changing, 27
serial port (DB-9), 169 running in maximum mode, 28
USB ports, 169 troubleshooting with, 14
using for fault diagnosis, 13
power cords
plugging into server, 153 Predictive Self-Healing (PSH)
unplugging before servicing the system, 61 about, 17
clearing faults, 49
power distribution board
faults detected by, 12
about, 2
faults displayed by ILOM, 33
installing, 128
memory faults, 21
removing, 126
PSH
power off, 64
see Predictive Self-Healing (PSH)
Power OK (system LED), 12
power supply
Q
about, 85
quick visual notification, 10
AC Present LED, 12, 92
DC OK LED, 91
device identifiers, 91 R
Fault LED, 31, 86, 92 rack
hot-swapping, 87, 90 extending server to maintenance position, 65
installing, 87, 90 removing server from, 67
removing, 86, 89 Ready-to-Remove (hard drive LED), 74, 76
Power Supply Fault (system LED) rear panel access, 5
about, 5, 92 rear panel LEDs, 7
interpreting to diagnose faults, 31 removefru command, 55
using to verify successful power supply removing, 132
replacement, 88 battery, 125
powercycle command, 28, 54 CMP/memory module, 99
powering off server DVD-ROM drive, 118
emergency shutdown, 65 fan tray, 81, 83
from service processor prompt, 64 fan tray carriage, 129
graceful shutdown, 64 FB-DIMMs, 105
service processor command, 64 flex cable assembly, 141
powering on front bezel, 116
at service processor prompt, 153 front control panel, 144
following emergency shutdown triggered by top front I/O board, 147
panel removal, 150, 153 hard drive, 73, 77
using Power button, 153 hard drive backplane, 132
poweron command, 54 IDPROM, 123
power-on self-test (POST), 19 motherboard, 135
about, 19 PCIe card, 93

Index 183
power distribution board, 126 replacement, 108
power supply, 86, 89 showcomponent command, 24, 51
server from rack, 67 showenvironment command, 55
service processor, 120 showfaults command
reset command, 55 syntax, 55
reset, system showfru command, 25, 56
using ILOM, 28 showkeyswitch command, 56
using POST commands, 28
showlocator command, 56
resetsc command, 55
showlogs command, 56
showplatform command, 56, 62
S
shutdown
safety information, 59
triggered by top cover removal (emergency
safety symbols, 60
shutdown), 150
sanity check for hardware components, 19 using Power button (emergency shutdown), 5
SCC module using Power button (graceful shutdown), 5
and host ID, 2 using powercycle command (graceful
and MAC addresses, 2 shutdown), 54
serial management port using powercycle -f command (emergency
pinouts, 167 shutdown), 54
serial number, chassis, 62 using poweroff command, 54
serial port (DB-9) slide rail latch, 66
pinouts, 169 Solaris log files, 12
service processor Solaris log files as diagnostic tool, 12
installing, 122 Solaris OS
removing, 120 checking log files for fault information, 12
Service Required (system LED), 32 collecting diagnostic information from, 35
about, 4 message buffer, checking, 36
cleared by enablecomponent command, 48 message log files, viewing, 36
interpreting to diagnose faults, 31 Solaris Predictive Self-Healing, 17
triggered by ILOM, 15 SunVTS, 18
triggered by power supply fault, 92 as fault diagnosis tool, 12
set command browser environment, 40
and component_state property, 48 Component Stress parameter, 40
setkeyswitch parameter, 28, 55, 56, 107 exercising the system with, 38
setlocator command, 4, 7, 55, 66 software packages, 41
show faulty command, 31, 43, 55 System Excerciser, 40
and faults detected by POST, 35 tests, 42
and PSH faults, 34 user interfaces, 38, 39, 41, 42
and Service Required LED, 32 using for fault diagnosis, 12
description and examples, 32 verifying installation, 38
environmental fault, 34 syslogd daemon, 36
reasons to use, 33 system bus topology, 162
use in detecting faults in an External I/O system console, 23
Expansion Unit, 35 system console, switching to, 23
using to check for faults, 12
system controller, 10
using to diagnose FB-DIMMs, 106
using to verify successful FB-DIMM

184 SPARC Enterprise T5440 Server Service Manual • July 2009


T
tools required for service, 62
Top (system LED)
about, 5
top cover
and emergency shutdown, 150
installing, 150
troublehooting
CMP1 failure, 156
troubleshooting
AC OK LED state, 12
actions, 12
by checking Solaris OS log files, 12
CMP0 failure, 156
FB-DIMMs, 22
Power OK LED state, 12
using LEDs, 30
using POST, 13, 14
using SunVTS, 12
using the show faulty command, 12

U
UltraSPARC T2+ multicore processor, 18
Universal Unique Identifier (UUID), 18, 45
USB ports
pinouts, 169
USB ports (front), 3

V
virtual keyswitch, 28, 107

X
XAUI card
about, 1
configuration guidelines, see PCIe configuration
guidelines

installing, See PCIe card, installing

Index 185
186 SPARC Enterprise T5440 Server Service Manual • July 2009

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy