0% found this document useful (1 vote)
719 views88 pages

Uc On Ucs B Series Troubleshooting Guide

UCS documentation

Uploaded by

RustagiSumit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
719 views88 pages

Uc On Ucs B Series Troubleshooting Guide

UCS documentation

Uploaded by

RustagiSumit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Troubleshooting Supplement

Cisco UCS B-Series

Section Links
UCS tools for Troubleshooting Page 3
Blade/Server Troubleshooting Page 36
IOM (FEX) Troubleshooting Page 56
Fabric Interconnect Troubleshooting Page 74
SAN Troubleshooting Page 88

UCS tools for Troubleshooting

System Components -Major Points of Service


UCS Manager (XML and CLI),
NXOS, Physical Connections
to Chassis & Core SAN/LAN
network, Cluster Operations

Cisco UCSManager
Embedded in Fabric Interconnect

Cisco UCS6100 Series Fabric


Interconnects
UCS6120XP20 Port Fabric Interconnect

Chassis Management
UCS6140XP40 Port Fabric Interconnect
Controller (CMC) Operations,
Chassis Discovery, Physical
Cisco UCS2100 Series Fabric Extenders
Connections to Fabric
Logically part of Fabric Switch
Interconnect (FI) and Logical
Inserts into Blade Enclosure
Connections to Adaptor
Cards
Cisco UCS5100 Series Blade Chassis
Flexible bay configurations
Logically part of Fabric Interconnect

Baseboard Management
Controller (BMC) of
Compute nodes, All Compute
node Components (memory,
proc, mezzcards, disk

Cisco UCSB-Series Blade Servers


UCS B-200 M1 Blade Server
UCS B-250 M1 Extended Memory Blade Server

Cisco UCS Network Adapters


Power, Fans, Connectors

Three adapter options


Mix adapters within blade chassis

61xx Fabric Interconnect (FI)


Active/Active Clustered System
Navigation to proper component when troubleshooting
CLI NX-OS or UCSM
Virtual IP

Management Network

IP #A

Switch-A#

IP #B

Switch-B#

UCS 2100 Fabric Extender Switch Connection


Each UCS 2100 Fabric Extender in a UCS 5100 Blade Server Chassis is
connected to a 6100 Series Fabric Interconnect for Redundancy or
Bandwidth Aggregation
Fabric Extender provides 4x10GE ports to the NX5K switch.
Link physical health and the chassis discovery occurs over these links
UCS 6100 Series Switch B

UCS 6100 Series Switch A

UCS 5100 Series Blade


Server Chassis

Back

UCS 2100 Series Fabric Extenders

Unified Compute System Manager


Part of UCS Troubleshooting will be the
verification UCSM is communicating to end
systems correctly
Management
interfaces

Redundant
management
service
UCSM

switch elements

UCSM

chassis elements
multiple protocol
support

server elements

Redundant management plane


7

UCSM access

Enable Logging in Java to capture issues

Example of session log file on client

Client logs for debugging UCSM access & Client KVMaccess are found at this location
on Client system:
C:\Documents and Settings\userid\Application Data\Sun\Java\Deployment\log\.ucsm

UCSM Client Logs


To find what log youshould currently view for issues with UCSM Window go to task manager to check the
process id forthe javaw process. The same file should appear in the log area also baseit offthe time
modified.

Presentation_ID

2010 Cisco and/or its affiliates. All rights reserved.

Cisco Confidential

Interface Stats and reports

Statistics breakdown

Live/now

History

UCS Internal Operations


Unified Compute System Manager (UCSM)
& Data Management Engine (DME) Runs as a cluster
State-full switch-over
Object state is replicated

Distributed Cluster State


Stored in Chassis EPROM
Solves split brain
Application Gateway (AG)
interfaces to the blade

Fabric Interconnect A

Fabric Interconnect B
Interface Layer

Interface Layer
UCSM-A
Replicator

DME

UCSM-B

HA
Controller

HA
Controller

Replicator
FSM

FSM

(active)

Persistifier

Application Gateway Layer

Application Gateway Layer

Chassis 2

CMC

CMC

Chassis 3

CMC

...

CMC

EPROM

CMC

EPROM

Chassis 1

CMC

EPROM

EPROM

CMC

(standby)

Persistifier

flash

flash

DME

chassis

CMC

Events per component

FarNorth-A# scope server ?


WORD
<chassis-id>/<blade-id>
dynamic-uuidDynamic UUID
FarNorth-A# scope server 1/1
FarNorth-A /chassis/server # show event

Server Discovery FSM


FSM runs as a workflow involving many stages (FSM-Stage)
Workflows are predefined and stages can be skipped if:
Not needed (in HA if remote is down, not NIC configuration for Oplin)
FSM Flags (shallow checkpoint or deep checkpoint)

Each Stage is an interaction between:


DME Application Gateway -> End Point

DME just manages the state of the object and workflow, and then
instructs the AG to perform the activity.
AGs do the real work.
FSM usually have the following notation
FSM <Object><Workflow><Operation><Where-is-it-executed>
Object Blade/Chassis
Processing Node Utility OS
Linux-based pre-boot execution environment that can boot on a
Workflow Discover/Association
processing node to run diagnostics, report inventory, or configure the
firmware state of the Blade
Operation Pnuos-Config
Where is generally , or A or B or Local or Peer
If Where is not specified, it is executed on managing node

FSM
Most every action
done by the
UCSM has a
FSM to verify
operation and
status
View and monitor
each action for
ongoing feedback
and progress
state of an action
Logs kept for
review and
troubleshooting

OBFL

Onboard Fault Log stores hardware logs on the


different components, saved at time of issue.
Alternate method to viewed by connecting to the
device.
Show tech-support will capture these logs

System Event Log (SEL) EventsSupported


Server BIOS events
3 Kinds of equipment end-points:
Memory Unit (DIMM)
ECCerrors, Address Parity, Memory Mismatch

Processor Unit
Memory Mirroring, Sparing, SMILink errors

Motherboard
PCIe, QPIuncorrectable errors, Legacy PCI errors
All these errors are modeled as stats properties. The ones for which thresholds are not
defined get reported as statistics only

BMC, BIOS, OS log platform errors to BMCs System Event Log


(SEL) Buffer
POST and Run Time errors
Used as an Effective health monitoring tool

System Event Log (SEL) -config


Users can define rules (policies) for backing up and clearing SELacross all
servers in the UCS system, or they can manually trigger a SELbackup on
individual servers.

System Event Logs = Management Logs


Chassis

Make sure that servers are discovered


Make sure backup destination path is valid
Can be done via CLIalso

Server

CLI navigation
SSH or Telnet to the Cluster IP when possible
You will connect to the Primary FI in the cluster automatically
Cisco UCS 6100 Series Fabric Interconnect
Using keyboard-interactive authentication.
The copyrights to certain works contained herein are owned by
other third parties and are used and distributed under license.
Some parts of this software may be covered under the GNU Public
License or the GNU Lesser General Public License. A copy of
each such license is available at
http://www.gnu.org/licenses/gpl.html and
http://www.gnu.org/licenses/lgpl.html
FarNorth-B#

FarNorth-B# show cluster state


Cluster Id: 0xf76362a0c56011de-0x8446000decd07b44
B: UP, PRIMARY
A: UP, SUBORDINATE
HA READY

UCS CLI navigation Structure


Almost same as NXOS, slight differences in layout
But Configuration is in XML structure
FarNorth-B#
acknowledge Acknowledge
backup
Backup
clear
Reset functions
commit-buffer Commit transaction buffer
connect
Connectto Another CLI
decommission Decommissionmanaged objects
discard-buffer Discard transaction buffer
end
Go to exec mode
exit
Exitfrom command interpreter
recommission RecommissionServer Resources
remove
Remove
scope
Changes the current mode
set
Setproperty values
show
Showrunning system information
terminal
Set terminal line parameters
top
Go to the top mode
up
Go up one mode
where
Show information about the current
mode

FarNorth-B# show
chassis
Chassis
cli
CLIcommands
clock
Display current Date
cluster
Clustermode
configuration
Show information about configuration sessions
eth-uplink
Ethernet Uplink
event
EventManager commands
fabric-interconnect Show Fabric Interconnect
fault
Fault
identity
Identity
iom
IO Module
license
Show the contents of all the license files
org
Organizations
security
Securitymode
sel
System Event Log
server
Server
service-profile
Service Profile
system
System-related show commands
timezone
Set timezone
version
System version
vif
Virtual Interfaces

UCS Configuration from CLI


Not recommended as best practice but is some
times required due to problem
More for use when direct troubleshooting or
verification of proper configfrom UCSM
Will give you good understanding of XML structure
for third party API configurations and uses of
navigation
As system admin for troubleshooting you will need
to be somewhat familiar with CLI

XML configuration naviation


Configuration verification or to so pending changes
FarNorth-A# show configuration ?
<CR>
>
Redirect it to a file
>>
Redirect it to a file in append mode
all
All
no-diff-markers Don't Show Diff Markers
no-pending
Don't Show Pending Config
pending
Show Only Pending Config
|
Pipe command output to filter

Save off configto file


(UCSM also has backup methods)
FarNorth-A# show configuration > ?
ftp:
DestFile URI
scp:
DestFile URI
sftp:
DestFile URI
tftp:
DestFile URI
volatile:
DestFile URI
workspace: DestFile URI

Configuration tools
FarNorth-A# show configuration | ?
cut
Print selected parts of lines.
egrep Egrep-print lines matching a pattern
grep
Grep-print lines matching a pattern
head Display first lines
last Display last lines
less Filter for paging
no-more Turn-off pagination for command output
sort Stream Sorter
tr
Translate, squeeze, and/or delete characters
uniq
Discard all but one of successive identical
lines
vsh
The shell than understands clicommand
wc
Count words, lines, characters
begin Beginwith the line that matches
count Countnumber of lines
end
Endwith the line that matches
exclude Excludelines that match
include Includelines that match

Scope
Scoping movement to different UCS configurationComponents
Details on hardware components done with connect command

You want to be on the Primary FI


FarNorth-B# scope
adapter
chassis
eth-server
eth-uplink
fabric-interconnect
fc-uplink
firmware
host-eth-if
host-fc-if
monitoring
org
security
server
service-profile
system
vhba

Mezzanine Adapter
Chassis
Ethernet Server Domain
Ethernet Uplink
Fabric Interconnect
FC Uplink
Firmware
Host Ethernet Interface
Host FC Interface
Monitor the system
Organizations
Securitymode
Server
Service Profile
Systems
VHBA

Management Commands (scope, where, up & top)


UCSM Navigation

CLI Equivalent to NavPane

Connect NXOS
Connecting from the XML to the Fabric Interconnect
(FI) standard NXOS component.
Used to assist in troubleshooting very familiar to IOS
and Nexus users and all the show commands
Used to run advised debugs
Show switch running config(non server config)
Enable and run ethanalyzer
Clear interface counters found on the FI
Cannot be used to configure UCS (read only)

Connect

Hardware Troubleshooting

Connect attaches you to hardware

and read only NXOS


FarNorth-B# connect
adapter
bmc
clp
iom
local-mgmt
nxos

Mezzanine Adapter
Baseboard Management Controller (CIMC)
Connect to DMTFCLP
IO Module
Connect to Local Management CLI
Connect to NXOSCLI

FarNorth-A# connect local-mgmt


<CR>
a
Fabric A
Defaults to primary
b
Fabric B

Most dangerous

-erase configuration
-reboot

FarNorth-A(local-mgmt)# ?
cd
Change current directory
clear
Reset functions
cluster
Clustermode
connect
Connectto Another CLI
copy
Copya file
cp
Copy a file
delete
Deletemanaged objects
dir
Show content of dir
enable
Enable
end
Go to exec mode
erase
Erase
erase-log-configErase the mgmt logging configfile
exit
Exitfrom command interpreter
install-license Install a license
ls
Show content of dir
mkdir
Create a directory
move
Movea file
mv
Move a file
ping
Test network reachability
pwd
Print current directory
reboot
Reboots Fabric Interconnect
rm
Remove a file
rmdir
Remove a directory
run-script
Run a script
show
Showrunning system information
ssh
SSHto another system
tail-mgmt-log Tail mgmt log file
telnet
Telnetto another system
terminal
Set terminal line parameters
top
Go to the top mode
traceroute
Tracerouteto destination

Connect to NXOS
FarNorth-A# connect nxos <CR>
a
b

Fabric A
Fabric B

FarNorth-A(nxos)# ?
clear
Reset functions
only place you can clear counters today
cli
CLIcommands
debug
Debugging functions
debug-filter Enable filtering for debugging functions
end
Go to exec mode
ethanalyzer Configure ciscofabric analyzer
exit
Exitfrom command interpreter
no
Negate a command or set its defaults
ntp
Execute NTPcommands
pop
Popmode from stack or restore from name
push
Pushcurrent mode to stack or save it under name
show
Showrunning system information
system
Systemmanagement commands
terminal
Set terminal line parameters
test
Test command
undebug
Disable Debugging functions (See also debug)
where
Shows the clicontext you are in

Most popular example:


Show run
Show fex detail
Show interface
Show lacp
Debug
Sh npvflogi-table
Show mac-address-table

Ethernet Interfaces on CPU


Troubleshooting Uses
Ethanalyzerterminology, internal ethernetinterfaces are used:
eth3= inbound-lo
eth4= inbound-hi

eth3handles Rx and Txof low priority control pkts


IGMP, CDP
TCP/UDP/IP/ARP (for management purpose only)

eth4 handles Rx and Txof high priority control pkts


FC (FC packets come to Switch CPU as FCoE packets) and FCoE
STP(spanning-tree) , LACP, DCBX(Data Center Bridging)

Save to file and use Wiresharktool to help diagnose issue

1) FarNorth-A(nxos)# ethanalyzerlocal interface inbound-hi write volatile:///ciscolive


2) FarNorth-A(local-mgmt)# cdvolatile:///
FarNorth-A(local-mgmt)# dir
25192 May 18 11:08:17 2010 ciscolive

3) FarNorth-A(local-mgmt)# copy volatile:///ciscolive tftp:


Enter hostname for the tftpserver: 10.91.42.134
Trying to connect to tftpserver......
Connection to server Established. Copying Started.....
TFTPput operation was successful

KVM
Tool to snapshot screen for support
Doing Web-ex recording best

Monitoring with UCSM and CLI


Compute System

Fabric Monitoring

BMC (Per blade)


Voltage, current sensors (

Power)

Thermal Sensors
DIMMs, CPUs, Adapter,
Sensor values available via IPMI

CMC
Per blade totals
Per chassis totals
PSU redundancy state

Changes are passed to UCSM


Critical transitions via asyncnotifications
Periodic polling
UCSM maintains stats
SAM Maintains state
State, stats available via GUI, CLI, API

Vifs
Interface stats
States

Adaptor
Interface stats
Aggregate stats
States

FEX
Interface stats
States

Switch
Interface stats
Vifs stats
States

Data Gathering for Support


UCSM detailed tech-support should be taken as soon as possible after a
failure occurred. UCSM tech-support contains a running configuration
snapshot as well as an application error/debug log.
If a problem is easily reproducible, please re-try a configuration attempt and
collect tech-support files immediately.
A# connect local-mgmt
A(local-mgmt)# show tech-support ucsmdetail
2. Collect tech-support on one or more problematic chassis (and its
components like server, IOM, BMC)
A(local-mgmt)# show tech-support chassis <chassis id> all detail
3.

Copy collected file to tftp.cisco.com (171.69.17.19)


A(local-mgmt)# copy
workspace:///techsupport/<name_of_the_file>.tar tftp://171.69.17.19

Data Gathering for Support -examples


FarNorth-A(local-mgmt)# show tech-support ucsmdetail
Initiating tech-support information task on FABRIC A ...
Initiating tech-support information task on FABRIC B ...
Completed initiating tech-support subsystem tasks (Total: 2)
All tech-support subsystem tasks are completed (Total: 2)
The detailed tech-support information is located at
workspace:///techsupport/20100517125801_FarNorth_UCSM.tar

FarNorth-A(local-mgmt)# dir
16 Oct 30 09:31:03 2009 cores
31 Nov 20 13:14:20 2009 diagnostics
1024 Oct 30 09:29:05 2009 lost+found/
1024 May 17 12:59:47 2010 techsupport/

FarNorth-A(local-mgmt)# show tech-support chassis 1 all detail


Initiating tech-support information task on Chassis 1 FabricExtender1 ...
Remotely initiating tech-support information task on Chassis 1 FabricExtender2
Initiating tech-support information task on Chassis 1 FabricExtender2 ...
Initiating tech-support information task on IBMC1 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/1 ...
Initiating tech-support information task on IBMC2 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/2 ...
Initiating tech-support information task on IBMC3 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/3 ...
Initiating tech-support information task on Adaptor 2 on Chassis/Blade 1/3 ...
Initiating tech-support information task on IBMC7 on Chassis 1 ...
Initiating tech-support information task on Adaptor 1 on Chassis/Blade 1/7 ...
Completed initiating tech-support subsystem tasks (Total: 11)
All tech-support subsystem tasks are completed (Total: 11)
The detailed tech-support information is located at
workspace:///techsupport/20100517124544_FarNorth_BC001_all.tar

FarNorth-A(local-mgmt)# cd///techsupport
FarNorth-A(local-mgmt)# ls
2140160 May 17 12:52:58 2010 20100517124544_FarNorth_BC001_all.tar
12871680 May 17 12:59:47 2010 20100517125801_FarNorth_UCSM.tar

Core Dumps

Once TFTPcore Exporter is


configured and enabled, dumps
will be transferred
Once transferred, select and
move to trash can

Blade Troubleshooting

Troubleshooting Flow
For rest of the session we will work from Blade servers up toward LAN and
SAN network
End

LAN-SAN

FabricInterconnects
IOM Modules

Blades
Start

Common Debug Scenarios

Blades

BMC doesnt boot


Corrupt BMC BIOS, Post Failure, not completing
Attempt to connect to BMC to diagnose
View Logs, collect tech-support
Bad Service-Profile -Association Failure

Bad Hardware
Bad/Reseat/Replace Dimm(s)
CPU or other component check logs

Adaptors issues
Connect to Mezzcards to Diagnose issues

BMC Troubleshooting

- Debug Firmware Utility

Command

Description

mctool

Gets basic information on the State of the BMC to


USC management API

network

See current network configuration and socket


information

obfl

Live obfl

messages

Live /var/log/messages file

alarms

What sensors are in alarm

sensors

Current sensor readings from IPMI

power

The current power state of the x86

Connect CIMC

Debug Utility

Show tech detail and logs


Get snapshot of KVMscreen
To verify health of blade if
questioning UCSM and
wanting to look at lowest level
of Blade data points
FarNorth-A# connect cimc1/3
Trying 127.5.1.3...
Connected to 127.5.1.3.
Escape character is '^]'.
BMC Debug Firmware Utility Shell
[ help ]#
Useful commands marked with arrow

__________________________________________
Debug Firmware Utility
__________________________________________
Command List
__________________________________________
alarms
cores
exit
help [COMMAND]
images
mctools
memory
messages
network
obfl
post
power
sensors
sel
fru
mezz1fru
mezz2fru
tasks
top
update
users
version
__________________________________________
Notes:
"enter Key" will execute last command
"COMMAND ?" will execute help for that command
__________________________________________

MezzCards Common Debug & Isolation Hints


Verify physical link state between IOM and M81KR
using show interface brief on the switch CLI

VIC M81KR(Palo)

Verify vifstate and vnicstate from M81KR


perspective using show-vifs command and showsystemstatuscommand.
Find vifcorresponding to the link
Verify M81KR-Intel/M81KR-QorEphysical link state
using M81KRLink Event Log
Verify state of the control channel (VIC/DCBX/VNTAG)
Verify state of VIF from vicprotocol perspective (VIC
log on M81KR)
For FC, look at FC logs for FLOGI/LS_ACC
Look at the link state from host perspective using
host based tools

M71KR-Q & M71KR-E (Menlo)

M81KR

-Palo Adaptor

adapter 1/1/1 # help


Available commands:
connect
-Connect to remote debug shell
exit
-Exit from subshell
help
-List available commands
history
-Show command history
show-fwlist
-Show firmware versions on the adapter
show-identity
-Show adapter identity
show-phyinfo
-Show adapter phyinfo
show-systemstatus -Show adapter status
adapter 1/1/1 # connect
adapter 1/1/1 (top):1# help
Available commands:
attach-fls
-Attach to fls
attach-mcp
-Attach to mcp
estat
-Run fcperformance monitor
exit
-Exit from subshell
help
-List available commands
history
-Show command history
phy-read
-Read PHYregister
show-fru
-Show FRUcontents
show-fwdtab -Show forwarding table
show-log
-Show system log
show-macstats -Show MAC statistics

Same type commands


as M71KR

Use connect command to Attach to Master


Control Program which is main Palo
firmware application to get more details

adapter 1/1/1 (top):2# attach-mcp

M81KR- Adapter Debug CLI (vifinfo)


adapter 2/8/1 (top):2# attach-mcp
vnic-shows vnicoverview
FarNorth-A# connect adapter 2/8/1
adapter 2/8/1 # connect
adapter 2/8/1 (top):1# attach-mcp
adapter 2/8/1 (mcp):1# vnic
vnicid : internal id of vnic, use for other vniccmds
vnicname : ucsmprovisioned name for this vnic
vnictype : en=ethernet, fc=fcoe
vnicstate: state of vnic
lif
: internal logical if id, use for other lif/vifcmds
lifstate : state of lif
vifuif : bound uplink 0 or 1, =:primary, -:secondary,
>:current
vifucsm: ucsmid for this vif
vifidx : switch id for this vif(vethXXX)
vifvlan: default vlanfor traffic
vifstate : state of vif

Details of Vif

Vifinfo shows network connectivity


COS, default vlan, rate limits
Vifinfo shows address registration list
Unicast, broadcast, multicast

adapter 2/8/1 (mcp):2# vif2


lifid: 2
uif: 0
state: UP
adminst: UP
flags: NIV, CREATED, VIFHASH, VUP, VIFINFO
vifindex: 1241
hash: 89
priority: 0
create retries: 2
provinfo.oui : 00 00 0c
last req: VIF_ENABLE
provinfo.type: SAM_CA
reqstatus: OK
provinfo.data.vifid: 1241
reqcc: SUCCESS
provinfo.data.cookie: 0x5285a
evtrace: LINK_UPCREATE_FAILEDTIMEOUT
provinfo.data.viftype: ETH
CREATE_FAILEDTIMEOUT CREATE_OKENABLE_OKSET_UP
vifinfo.priority : 0
vifinfo.vifid
:2
reg'daddrs: vlan
0 mac00:25:b5:00:00:17
vifinfo.default_cos: 0
vlan 0 macff:ff:ff:ff:ff:ff
vifinfo.vifstate : E--vlan 0 mac00:00:00:00:00:00
vifinfo.vlan
:1
inaddaddrs:
vifinfo.ratelimit.burstsize
:0
toaddaddrs:
vifinfo.ratelimit.rate
: -1
indeladdrs:
todeladdrs:

M81KR MAC Statistics


adapter 2/8/1 (mcp):3# dcem-macstats0
TOTAL DESCRIPTION
24841 Txframes len== 64
63470 Txframes 64 < len<= 127
51113 Txframes 128 <= len<= 255
380 Txframes 256 <= len<= 511
225020 Txframes 512 <= len<= 1023
160 Txframes 1024 <= len<= 1518
2865 Txframes 1519 <= len<= 2047
367849 Txtotal packets
147903879 Txbytes
367849 Txgood packets
346958 Txunicastframes
20277 Txmulticast frames
614 Txbroadcast frames
25 Txframes with VLAN tag
8 Rx Frames len== 64

1063448 Rx Frames 64 < len<= 127


41133 Rx Frames 128 <= len<= 255
24707 Rx Frames 256 <= len<= 511
2359 Rx Frames 512 <= len<= 1023
372 Rx Frames 1024 <= len<= 1518
8901 Rx Frames 1519 <= len<= 2047
1140928 Rx total received packets
110619220 Rx bytes
1140928 Rx good packets
311492 Rx unicastframes
74263 Rx multicast frames
755173 Rx broadcast frames
147903879 Txbytes for good packets
110619220 Rx bytes for good packets

Adapter Debug

CLI(logs)

show-log display internal adapter logs


adapter 1/3/1 (top):2# show-log
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.uif[289]-6-Port 0 set to VNTAG mode
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.uif[289]-6-Port 0: Running
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vif[289]-6-uif0 starting link up in niv
mode
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vic[289]-6-vic0: peer eth0.0
00:0d:ec:6d:b8:3c start
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.uif[289]-6-Port 0 FSM:
WAIT_NIVDELAYTIMEO/RXVNTAG => RUNNING
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vic[289]-6-vic0: starting timer for peer
VIC_OPEN
2009 Oct 5 16:21:15 palo %BCxx_MEZZxxxx_mcp.vic[289]-6-vic0: app_start_done flags
OPEN_SENT status OK
...

Memory errors
Check Server Event Log/Faults

sh sel2/1

5ed| 03/29/2010 02:20:50 | Memory 0x02| Uncorrectable ECC/other uncorrectable memory error | Rank: 0, DIMMSocket: 1, Channel: C, Socket: 0 | Asserted

What to gather and look at for memory issues

On CIMC -do show tech


On KVM-capture the BIOS version
On KVM-BIOS capture the memory configuration
On CIMC -capture the BIOS version
On CIMC -capture the memory inventory
Show memdetails (get shot)

Reboots
Need to find out reason for reboot of hardware
BMC (CIMC) issue in hardware/firmware on server
UCS Service Profile caused by a profile change/issue
Other Hardware on the blade CPU, Memory
User induced reset button

Blade Reboots

Viewing OBFLfor reason of reboot

Reboot - pressing front-panel button:


0:2009 Dec 29 19:45:04:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates Reset occurred
4:2009 Dec 29 19:45:04:BMC:kernel::<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618FCSd/bmc/drivers/vdd_pwr_good
/gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Dec 29 19:45:04:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/block_transfer/
block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Dec 29 19:45:04:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]
5:2009 Dec 29 19:45:05:BMC:IPMI:470: Pilot2SrvPower.c:369:BladePower Changed To: [ OFF ]
5:2009 Dec 29 19:45:05:BMC:IPMI:497: VirtualSEL.c:26:SELEvt[02 0D]< C10B02 41 5C3A4B20 00 04 25 52 08 00 FF FF>
4:2009 Dec 29 19:45:34:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/vdd_pwr_good/
gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Asserted
5:2009 Dec 29 19:45:34:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = ON

This is a signature of HW failure (power off followed by power on in 4-5 seconds. Intel feature to react on HW failure):
0:2009 Nov 25 11:44:55:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates Reset occurred
4:2009 Nov 25 11:44:55:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/vdd_pwr_good/
gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Nov 25 11:44:55:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/block_transfer/
block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Nov 25 11:44:55:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]
4:2009 Nov 25 11:44:55:BMC:kernel:-:<4>kbdmouse_write: mouse write aborted for device reset.
5:2009 Nov 25 11:44:55:BMC:IPMI:472: Pilot2SrvPower.c:369:BladePower Changed To: [ OFF ]
5:2009 Nov 25 11:44:55:BMC:IPMI:500: VirtualSEL.c:26:SELEvt[22 02]< 22 02 02 B718 0D4B20 00 04 25 52 08 00 FF FF>
3:2009 Nov 25 11:45:16:BMC:doctor-bmc:584: doctor-bmc.c:1143:Tcp-> Connection between remote ip0xFE00037Fat port 0x86A4
and local ip0x200037Fat port 0xFAAis in TCP_TIME_WAITstate for at least 2 min 30 seconds.
3:2009 Nov 25 11:45:16:BMC:doctor-bmc:584: doctor-bmc.c:1155:Tcp-> Total Errors Found: 1
5:2009 Nov 25 11:45:21:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/pilot2_power
/pilot2_power.c:266:do_power_on
remote ip0xFE00037F= 254 0 3 127 or 127.0.3.254 (the CMC0interface to the blades) and local ip0x200037F= 2 0 3 127 or 127.3.0.2

Blade Reboots

Viewing OBFL for reason

This is actual customer power reset from UCSM (power on in 8 minutes):


0:2009 Dec 22 17:16:26:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates Reset occurred
4:2009 Dec 22 17:16:26:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/vdd_pwr_good/
gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Dec 22 17:16:26:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/block_transfer/
block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Dec 22 17:16:26:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]
5:2009 Dec 22 17:16:27:BMC:IPMI:474: Pilot2SrvPower.c:369:BladePower Changed To: [ OFF ]
5:2009 Dec 22 17:16:27:BMC:IPMI:511: VirtualSEL.c:26:SELEvt[98 02]< 98 02 02 EBFE 30 4B20 00 04 25 52 08 00 FF FF>
5:2009 Dec 22 17:24:49:BMC:mctool@127.5.254.1:1275: mcserver_ipmi_extensions.c:212:[mcserver_set_vdd_power] "Power Cycle
5:2009 Dec 22 17:24:49:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:313:do_cycle
5:2009 Dec 22 17:24:49:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:232:do_power_off
5:2009 Dec 22 17:24:59:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:266:do_power_on
4:2009 Dec 22 17:24:59:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/vdd_pwr_good/gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Asserted
5:2009 Dec 22 17:24:59:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = ON

Blade Reboots

Viewing OBFL for reason

This is IPMI request, coming from UCSM as authorized reboot or a result of having Desired power State as OFF.
5:2009 Dec 23 18:16:58:BMC:mctool@127.5.254.1:1275: mcserver_
ipmi_extensions.c:212:[mcserver_set_vdd_power]
"Power Off"
<---indicator that an IPMI initiated reset has occurred.
5:2009 Dec 23 18:16:58:BMC:kernel:-:<5>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/pilot2_power/pilot2_power.c:232:do_power_off
0:2009 Dec 23 18:17:03:BMC:kernel::<0>LPCReset ISR-> ResetState: 1 <---this indicates you've entered Reset for whatever reason
4:2009 Dec 23 18:17:03:BMC:kernel:-:<4>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers/
vdd_pwr_good/gooding/vdd_pwr_good_cb.c:19:Platformis Gooding: Deasserted
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>USB FS: VDDPower WAKEUP-Power Good = OFF
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>USB HS: VDDPower WAKEUP-Power Good = OFF
1:2009 Dec 23 18:17:03:BMC:kernel::<1>/nuova/builds1/ca-ventura_1-build/091027-100438-rev34618-FCSd/bmc/drivers
/block_transfer/block_transfer.c:564:block_transfer_deallocate_entire_list--> Dumped: 0x0000files.
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[0]
5:2009 Dec 23 18:17:03:BMC:kernel:-:<5>handle_exception: Handling MSD_STATE_DISCONNECTfor interface[1]

Also for all Resets the DMElogs should be viewed for more information,
DMElogs are found in the in /var/sysmgr/sam_logs/ inside the .tar file of
the <show tech-support ucsmdetail> svc_sam_dme.log
A# connect local-mgmt
A(local-mgmt)# show tech-support ucsmdetail

Serial over LAN

(SoL)

Requires Serial over LAN configured and IPMI profile configured


then applied to Server-profile
Access via same IP address as KVM
Can be configured on the fly and applied to service-profile without
disruption
Used IPMI open tool

http://ipmitool.sourceforge.net/

Management Network

IPMI User
Accessing
BMC
interface

Serial over LAN connection


KVMend point IP address on
Blade

IPMI
IPMI doesnot runon the OS installedon the blade
Totallyindependentof the installedOS; runs evenif OS isdown

IPMI runs on the Baseboard Management Controller


Supports servicabilityin four main areas:
System Event Log (SEL)
OS Watchdog, hardware alerts, etc.
SensorsData Repository(SDR)
Temperaturecontrols, Inventory, etc.
Power control
Serial over LAN

DMIDECODE

http://www.nongnu.org/dmidecode/

Dmidecode reports information about your system's hardware as described in


your system BIOS according to the SMBIOS/DMIstandard.
This will often include usage status for the CPU sockets, expansion slots (e.g.
AGP, PCI, ISA) and memory module slots, and the list of I/O ports (e.g. serial,
parallel, USB).
Support for Linux and Windows
dmidecode--type {KEYWORD / Number }

bios
system
baseboard
chassis
processor
memory
cache
connector
slot

IOM (FEX) Troubleshooting

Troubleshooting Flow
We will work from Blade servers up toward LAN and SAN network

End

LAN-SAN

FabricInterconnects
IOM Modules
Blades
Start

IOMconnections: chassisbackplaneview

Chassis
Path A
Path B

Path A

Path A

Blade 2

Blade 1
Path B

Path B

Blade 3

Blade 4

Blade 5

Blade 6
IOM1

IOM2

Blade 7
Half-widthservers: 1 mezzcard(one A and one B path)
Full-widthservers: 2 mezzcards(twoA & B paths)

FarNorth-A(nxos)# show fex


FEX
FEX
FEX
FEX
Number
Description
State
Model
Serial
-----------------------------------------------------------------------1
FEX0001
Online
N20-C6508 QCI132800SN
2
FEX0002
Online
N20-C6508 QCI131600Z9

IOM connections
EachIOM(akaFabricExtender) provides
8+1 internal IO channels(8 slots + 1 internal mgmtnetwork)
4 external ports (10Gbpseach; no Etherchannel in the 1st release)

The servers mezzcardsuse thoseIO channelsfor external


connectivity
Servers withone mezzcarduse one IO channelper IOM
vNIC1canfor instance use IOM1 whilevNIC2uses IOM2
This vNIC-to-IOMrouting isflexible and user-configurable

Servers withtwomezzcardsuse twoIO channelsper IOM


Server vNICsare automaticallypinnedto fabriclinks
EachIOMactuallyprovidesa 9
management connectivity

th

internalIO channelfor internal

Viewing Blade ports

Theseinterfaces

From<sh intbrief> at NXOSprompt)

are backplanetraces

EthX/Y/Z where
X= chassisnumber
Y= mezzcardnumber(always1 withhalf-widthblades)
Z = IOMport number(slot wherethe bladeserver resides)

IOM to Fabric Interconnect connections


UCSM calls theseports server ports
NXOSCLIcalls themfex-fabricinterfaces
Note: those EthX/Y ports are interfaces on the fabric interconnects

There canbe1, 2 or 4 ports betweenan IOMand a FI


FarNorth-A(nxos)# sh interface fex-fabric
Fabric
Fabric
Fex
FEX
Fex Port
PortState Uplink Model
Serial
--------------------------------------------------------------1 Eth1/1
Active 1
N20-C6508QCI132800SN
1 Eth1/2
Active 2
N20-C6508QCI132800SN
2 Eth1/5
Active 2
N20-C6508QCI131600Z9
2 Eth1/6
Active 1
N20-C6508QCI131600Z9
interface Ethernet1/1
switchportmode fex-fabric
pinning server
fex associate 1 chassis-serial FOX1327GKGNmodule-serial QCI132800SNmodule-slot left
no shutdown
interface Ethernet1/2
switchportmode fex-fabric
pinning server
fex associate 1 chassis-serial FOX1327GKGNmodule-serial QCI132800SNmodule-slot left

Actual IOM-to-FI pinning scheme


Server slots pinned to uplink
slot 1
slot 2
slot 3
slot 4
slot 5
slot 6
slot 7
slot 8

slot 1
slot 2
slot 3
slot 4
slot 5
slot 6
slot 7
slot 8

slot 1
slot 2
slot 3
slot 4
slot 5
slot 6
slot 7
slot 8

I
O
M

I
O
M

I
O
M

1 link

switch

Uplink: slots 1,2,3,4,5,6,7,8

How to read this: with one IOM-to-FIlink, all servers use that link

2 links

switch

Uplink 1: slots 1,3,5,7


Uplink 2: slots 2,4,6,8

How to read this: with two IOM-to-FIlinks, servers in slots 1,3,5,7 use link
number 1 while other slots use link number 2

4 links

switch

Uplink 1: slots 1,5


Uplink 2: slots 2,6
Uplink 3: slots 3,7
Uplink 4: slots 4,8

How to read this: with four IOM-to-FIlinks, servers in slots 1 and 5 use link 1,

Verifying IOM-to-FI pinning


FarNorth-A(nxos)# show run interface
ethernet1/1/7
version 4.1(3)N2(1.3)
interface Ethernet1/1/7
vntagmax-vifs30
pinning server
fabric-interface Eth1/1
no shutdown
FarNorth-A(nxos)# show run interface
ethernet2/1/8
version 4.1(3)N2(1.3)
interface Ethernet2/1/8
vntagmax-vifs30
pinning server
fabric-interface Eth1/5
no shutdown

Good for identifingproper


pathto Mezzadaptor
Eg: IOM1 ,slot 7 pinned
to link1; IOM2 slot 8
pinnedto link5 Do show
runinteX/Y/Z to verify

Show Fex Detail

FEX: 1 Description: FEX0001 state: Online


FEX version: 4.1(3)N2(1.3) [Switch version: 4.1(3)N2(1.3)]
FEX Interim version: 4.1(3)N2(1.2.168a)
Switch Interim version: 4.1(3)N2(1.2.168a)
Chassis Model: N20-C6508, Chassis Serial: FOX1327GKGN
Extender Model: N20-I6584, Extender Serial: QCI132800SN
Part No: 73-11623-04
Card Id: 67, Mac Addr: 00:26:51:08:67:f4, Num Macs: 10
Module SwGen: 12594 [Switch SwGen: 21]
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth1/1
Fabric interface state:
Eth1/1 -Interface Up. State: Active
Eth1/2 -Interface Up. State: Active
Fex Port
State Fabric Port Primary Fabric
Eth1/1/1 Up
Eth1/1
Eth1/2
Eth1/1/2 Up
Eth1/2
Eth1/2
Eth1/1/3 Up
Eth1/1
Eth1/2
Eth1/1/4 Up
Eth1/2
Eth1/2
Eth1/1/7 Up
Eth1/1
Eth1/2
Eth1/1/9 Up
Eth1/2
Eth1/2

FEX: 2 Description: FEX0002 state: Online


FEX version: 4.1(3)N2(1.3) [Switch version: 4.1(3)N2(1.3)]
FEX Interim version: 4.1(3)N2(1.2.168a)
Switch Interim version: 4.1(3)N2(1.2.168a)
Chassis Model: N20-C6508, Chassis Serial: FOX1317G26R
Extender Model: N20-I6584, Extender Serial: QCI131600Z9
Part No: 73-11623-04
Card Id: 67, Mac Addr: 00:24:97:1f:6d:aa, Num Macs: 10
Module SwGen: 12594 [Switch SwGen: 21]
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth1/5
Fabric interface state:
Eth1/5 -Interface Up. State: Active
Eth1/6 -Interface Up. State: Active
Fex Port
State Fabric Port Primary Fabric
Eth2/1/1 Up
Eth1/6
Eth1/5
Eth2/1/2 Up
Eth1/5
Eth1/5
Eth2/1/8 Up
Eth1/5
Eth1/5
Eth2/1/9 Up
Eth1/5
Eth1/5

Understanding the Virtual Interface


The servers with one mezzcard present two 10GE
external to the Fabric Interconnect interfaces
The Server OS views the interfaces as 10GENICsand
HBAsdepending on the configuration specified in the
Service Profile
These northbound interfaces can carry both Ethernet
and FC traffic (FCoE). We need a mechanism to identify
the origin server
Concept of Virtual Interface or VIFis created (see next slide)

Virtual interfaces (Vif)


Blade 1
Southbound or OS-side interfaces

veth1

OS

veth0
vhba0

vhba1

External mezz card 10GE port

Virtual interface tag


to associate frames to a VIF

IOM 1

Eth X/Y/Z interface

IOM 2

IOM-to-FI link

Vif 1

Vif 2

Fabric A

Vif3

Vif4

Fabric B

Attaching to FEX
FarNorth-A# connect iom?
<1-255> Chassis ID
FarNorth-A# connect iom1
Attaching to FEX 1 ...
To exit type 'exit', to abort type '$.'
Bad terminal type: "xterm". Will assume vt100.

From FEX attach CLI, user can monitor


CPU, memory etc.
show system resources
show process cpu
show process memory
show system uptime

VIFs
Ethernet and FC are muxedon the samephysical
links
concept of virtualinterfaces (vifs) to split
Ethand FC
Twotypes of VIFs: vethand vfc
Vethfor Ethernet ; vfcfor FC traffic

EachEthX/Y/Z interface typicallyhas multiple vifs


attachedto itto carry trafficto and froma server
To findall vifs associatedwitha EthX/Y/Z interface,
do this:
FarNorth-A(nxos)# show vifsinterface ethernet2/1/8
Interface
VIFS
----------------------------------------------------------------------Eth2/1/8
veth1241, veth1243, veth9461, veth9463

VIFs for FC traffic (FCoE)


FarNorth-A(nxos)# show vifsinterface ethernet2/1/8
Interface
VIFS
----------------------------------------------------------------------Eth2/1/8
veth1241, veth1243, veth9461, veth9463 ,

FarNorth-A(nxos)# sh int vethernet9463


vethernet9463is up
Bound Interface is Ethernet2/1/8
Hardware: VEthernet
Encapsulation ARPA
Port mode is access
Last link flapped 1week(s) 1day(s)
Last clearing of "show interface" counters never
1 interface resets
FarNorth-A(nxos)# show intvfc1271
vfc1271is up
Bound interface is vethernet9463
Hardware is Virtual Fibre Channel
Port WWNis 24:f6:00:0d:ec:d0:7b:7f
Admin port mode is F, trunk mode is off
snmplink state traps are enabled
Port mode is F, FCIDis 0x710005
Port vsanis 100

All vifs associatedwitha EthX/Y/Z


interfaces are pinnedto the fabricport
thatEthX/Y/Z interface ispinnedto.
Vifs in the 10000+ range are usedfor FC
traffic. Check the VLAN to VSAN
mapping(show vlan fcoe)
FarNorth-A(nxos)# show vifsinterface vethernet9463
Interface
VIFS
----------------------------------------------------------------------veth9463
vfc1271,

FCoE VLAN is100


FarNorth-A(nxos)# show vlanfcoe
VLAN
VSAN
Status
-------- -------- -------1
1
Operational
100
100
Operational

Redwood Connection Information

show tech-support fex <1 or 2>


This will capture a needed output
to determine congestion, packet
counters, Pause control on Server
ports and network ports on IOM
Next few slides are few examples of output

Redwood Traffic Information


Traffic Rates on IOM

Will show pause frames and drops if looking for performance concerns

RMON
Stats

Top commands to debugging


# Port Info
Show clock
Show platform fwmevent-history lif<PORT>
Show system internal ethpminfo interface <PORT>
Show system internal ethpmeven-history interface <PORT>
Show platform software dcbxinternal info interface <PORT>
Show platform software dcbxinternal errors
Show platform software sifmgrinfo interface <PORT>
Show clock

# IOM
Connected local-mgm<fabric>
Connect iom<chassis_id>
terminal length 0
show platform software redwood sts
show platform software redwood oper
show platform software redwood log
show platform software redwood elog
show platform software redwood ilog
show platform software redwood ints

#Global Info
Show clock
Show platform fwmevent-history errors
Show platform fwmevent-history msgs
Show platform fwmerrors
Show system internal ethpmevent-history errors
Show system internal ethpminfo trace
Show system internal ethpmevent-history msgs
Show platform software sifmgrevent-history errors
Show platform software sifmgrevent-history lock
Show platform software sifmgrinfo trace
Show platform software sifmgrevent-history msgs

Fabric Interconnect Troubleshooting

Troubleshooting Flow
We will work from Blade servers up toward LAN and SAN network

End

LAN-SAN

FabricInterconnects
IOM Modules
Blades
Start

6100 Fabric Interconnect Troubleshooting

Understanding the Fabric Port Manager


Physical Links issues
Server Links
FEX-Links
DCBXDiscovery
Mac Addresses functions in End Host Mode

Fabric Port Management


Managed by UCS Manager as part of overall chassis discovery
process
Number of deployed fabric ports defined in UCS Manager
service profile
Change in the number of deployed fabric ports require Reacknowledge Chassis
Supports Explicit Pinning only, as determined by UCS Manager
UCS Manager recalculates pinning distribution when fabric
port(s) go down
Supports even number of fabric ports only
No support for fabric port channel

Troubleshooting 10GBE-

Link Not Coming Up

Check PHYdriver software link state:


switch# show hardware internal gatosport ethernet1/19 xcvrinfo
Port 0/18:
State: UP
XCVRinsert debouncetimer running
XCVRlink debouncetimer not running
TX enable signal is on
Debouncetimeout: 0.100 seconds
Link up : 506097 usecsafter Wed May 12 22:38:08 2010
Link dndebouncestart : 0 usecsafter Thu Jan 1 00:00:00 1970
Link debounceend : 0 usecsafter Thu Jan 1 00:00:00 1970
Counters:
Interrupt cntrs:
Bit error cntrs:
Bit Error Rate: 0x0000000000000000Bit Error Rate(since linkup): 0x00000000
Error blocks : 0x0000000000000043Error blocks(since linkup) : 0x00000011
Link cntrs:
Link up: 0x9(9)
Link dn: 0x0(0)
Link debouncedwith link up: 0x0(0)
Link debouncedwith link up since last enable: 0x0(0)

Enabling the Server link


After enabling fabric port
FarNorth-A(nxos)# show running-configinterface ethernet1/1
version 4.1(3)N2(1.3)
interface Ethernet1/1
switchportmode fex-fabric
pinning server
fex associate 1 chassis-serial FOX1327GKGNmodule-serial QCI132800SNmodule-slot left
no shutdown
FarNorth-A(nxos)# show interface fex-fabric
Fabric
Fabric
Fex
FEX
Fex Port
PortState Uplink Model
Serial
-------------------------------------------------------------------------------------1 Eth1/1
Active
1
N20-C6508QCI132800SN
1 Eth1/2
Active
2
N20-C6508QCI132800SN
2 Eth1/5
Active
2
N20-C6508QCI131600Z9
2
Discovered 1
N20-C6508QCI131600Z9
2 Eth1/6
Configured 1
N20-C6508QCI131600Z9
2 Eth1/6
Fabric Up 0
2 Eth1/6
Active
1
N20-C6508QCI131600Z9

Transition States

Fabric Port Management


FarNorth-A(nxos)# show fex 1 detail
FEX: 1 Description: FEX0001 state: Online
FEX version: 4.1(3)N2(1.3) [Switch version: 4.1(3)N2(1.3)]
FEX Interim version: 4.1(3)N2(1.2.168a)
Switch Interim version: 4.1(3)N2(1.2.168a)
Chassis Model: N20-C6508, Chassis Serial: FOX1327GKGN
Extender Model: N20-I6584, Extender Serial: QCI132800SN
Part No: 73-11623-04
Card Id: 67, Mac Addr: 00:26:51:08:67:f4, Num Macs: 10
Module SwGen: 21 [Switch SwGen: 21]
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth1/1
Fabric interface state:
Eth1/1 -Interface Up. State: Active
Fabric Ports
Eth1/2 -Interface Up. State: Active
Fex Port
State Fabric Port Primary Fabric
Eth1/1/1 Up
Eth1/1
Eth1/2
Eth1/1/2 Up
Eth1/2
Eth1/2
Eth1/1/3 Up
Eth1/1
Eth1/2
Pinned fabric Port
Eth1/1/4 Up
Eth1/2
Eth1/2
Eth1/1/7 Up
Eth1/1
Eth1/2
Eth1/1/9 Up
Eth1/2
Eth1/2
Logs:
[05/12/2010 22:38:28.273779] Module register received
[05/12/2010 22:38:28.276776] Registration response sent
FEX Event history
[05/12/2010 22:38:28.546132] Module Online Sequence

Network Interface Virtualization (NIV)


protocol negotiation w/ DCBX
Switch and adapter uses DCBX(LLDPbased protocol) NIV
TLV(Feature Type 7, Subtype 0) to:
indicate NIVcapability
negotiate control VNTAGfor virtual interface used by adapter
management entity

Initial protocol frames are non-VNTAG


All frames contain VNTAGonce negotiated
VIC protocol
Allocate/Deallocatevirtual

interfaces (driven by Interface Virtualizer)

Set VIFState (active/standby)


Virtual Interface list management (driven by switch)
MAC address registration (macfiltering offload from adapter to switch)

DCBXTroubleshooting
Checking for DCBXnegotiation results
In the dump of show platform software dcbxinternal info interface ethernet1/1/1 look
for every feature negotiation result as shown below
feature type 3 sub_type0
feature state variables: oper_version0 error 0 oper_mode1 feature_seq_no0 remote_feature_tlv_present1
remote_tlv_not_present_notification_sent0 remote_tlv_aged_out0
feature register paramsmax_version0, enable 1, willing 0 advertise 1, disruptive_error0 mts_addr_node
0x101mts_addr_sap0x1e5
Desired configcfglength: 1 data bytes:08
Operating configcfglength: 1 data bytes:08

Error
1)Indicates negotiation error.
2) Never expected to happen when connected to CNA adaptor
3) When two N5Ksare connected back-to-back
4) If PFCis enabled on different CoSvalues negotiation error can happen
Operating Config
Indicates negotiation result
Absence of operating configindicates that the peer does not support this DCBXTLVor negotiation error
remote_feature_tlv_present indicates whether the remote peer supports this feature TLVor not

MAC Address Learning Functions


Server macaddress is learned via traffic generated by the server
Once learned, the server macaddress is static
Server macaddress only learned on server port
MAC address learning is disabled on border ports
Network to server traffic can only be forwarded (subject to RFP and dj vu
check) if server macaddress is already learned on server port.
Server macaddress can move from one server port to another server port
Server macaddress can move outside the EH-node. The old server mac
address is removed when packet with the same source macis received on
the original pinned border port (more on that later). E.g. a VMmoved and
generates a gratuitous arp
Adapter can register macaddresses with the switch
Switch offloads adapter from performing macaddress filtering
Menlo adapters always registers * (send all traffic to Menlo)

Verifying End Host Mode Status and Configuration


FarNorth-A(nxos)# show mac-address-table
VLAN
MAC Address
Type Age
Port
---------+-----------------+-------+---------+-----------------------------FarNorth-A(nxos)# show mac-address-table ?
1
0025.b500.0004 static 0
veth1235
<CR>
1
0025.b500.0007 static 0
veth1243
>
Redirect it to a file
1
0025.b500.0008 static 0
veth1200
>>
Redirect it to a file in append mode
1
0025.b500.0009 static 0
veth1199
address
Address
0025.b500.000c static 0
veth1207
aging-time Display Aging Time (configured or default) 1
1
0025.b500.0017 static 0
veth1241
count
Display only the count of MAC entries
dynamic
Display Dynamic Entries
1
0025.b500.0018 static 0
veth1277
interface Interface
.
multicast Show Multicast MAC Table entries
. <cut>
notification Display Notification Information
.
static
Display Static Entries
4044
0024.971f.6a45 dynamic 0
Eth1/1/9
vlan
VLAN
4044
0024.971f.6b6f dynamic 0
Eth1/1/9
|
Pipe command output to filter
4044
0024.971f.6b8d dynamic 0
Eth2/1/9
4044
0024.971f.6da8 dynamic 0
Eth2/1/9
4044
0026.5108.67f2 dynamic 0
Eth1/1/9
4044
0026.5108.7de1 dynamic 0
Eth1/1/9
4044
0026.5108.ac59 dynamic 0
Eth1/1/9
4044
0026.5108.c9a1 dynamic 0
Eth2/1/9
1
0100.5e7f.fffa igmp 0
Po2veth1207
1
0100.5e7f.fffd igmp 0
Po2veth1277
200
0100.5e7f.fffa igmp 0
veth1199veth1200
Total MAC Addresses: 47

Mac address table

Verifying End Host Mode Status and Configuration


running-config
UCS-HA-B(nxos)# show running-configinterface ethernet1/9
nterfaceEthernet1/9
switchportmode trunk
switchporttrunk allowed vlan1
pinning border
no shutdown
UCS-HA-B(nxos)# show running-configinterface veth681
interface vethernet681
switchporttrunk allowed vlan1
bind interface Ethernet1/1/5
no pinning server sticky
pinning server pinning-failure link-down

Verifying End Host Mode Status and Configuration


Server port pinning information
FarNorth-A(nxos)# show pinning server-interfaces
---------------+-----------------+------------------------+----------------SIFInterface Sticky
Pinned Border Interface Pinned Duration
---------------+-----------------+------------------------+----------------Eth1/1
Yes
Eth1/2
Yes
Eth1/5
Yes
Eth1/6
Yes
veth1199
No
Po2
2d53:9:57
veth1200
No
Po2
2d53:9:59
veth1207
No
Po2
2d53:10:18
veth1235
No
Po2
2d53:10:22
veth1241
No
Po2
2d53:9:38
veth1243
No
Po2
2d53:9:38
veth1277
No
Po2
2d53:9:50
veth9395
Yes
veth9396
Yes
.
. <cut.>
.
Total Interfaces : 37

Verifying End Host Mode Status and Configuration


Border port information
FarNorth-A(nxos)# show pinning border-interfaces
--------------------+---------+---------------------------------------------------------Border Interface Status SIFs
--------------------+---------+---------------------------------------------------------Po2
Active veth1199veth1200veth1207veth1235
veth1241veth1243veth1277
Eth1/19
Down
Eth1/20
Down
Total Interfaces : 3

SAN Troubleshooting

Tracing a server FC connection


Determine the servers pWWN
Assigned through the service profile
Verify on the host it will match:

Check local FLOGI for that pWWN on UCS:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy