WP VM Performance and Troubleshootinng Esxtop
WP VM Performance and Troubleshootinng Esxtop
Performance and
Troubleshooting with
esxtop
1-800-COURSESwww.globalknowledge.com
Performance and Troubleshooting with
esxtop
Steve Baca VCP, VCI, VCAP, Global Knowledge Instructor
Introduction
This paper introduces and gives examples of how the esxtop utility can help address performance issues. First,
we will discuss the history of esxtop and show several different methods that can be utilized to start the moni-
toring tool. Next, we will discuss how to use esxtop by using interactive commands that can be typed in while
esxtop is running. Finally, we will look at how to use esxtop is given by looking at how to interpret CPU data
utilizing the esxtop utility.
History
The esxtop command is a tool based upon the old UNIX command-line tool called top that continuously up-
dates every five seconds, displaying a snapshot of the processes running on an ESXi host. The top program has
been around since the mid-1980s and has been ported to many different versions of UNIX and Linux. Originally,
VMware ported a version of the UNIX top program and customized it to gather statistics for the ESX host, the
standard top program was included in the service console as well. When VMware changed the direction of its
hypervisor and removed the service console, esxtop continued to be a useable command-line utility within the
ESXi hypervisor, which runs a proprietary version of UNIX. VMware also modified esxtop to run remotely and
called it resxtop. The remote resxtop runs within the vCLI, and allows the user to remotely connect to an ESXi
host and run esxtop.
esxtop/resxtop
The resxtop command is used when you want to run esxtop remotely from the vSphere command-line inter-
face (CLI) using vCLI, usually within the vMA. The resxtop utility is referred to as remote esxtop and offers a
secure method to run scripts across multiple ESXi hosts and virtual machines. This paper concentrates on how to
use esxtop, since once resxtop is started all of the counters and fields are the same.
# export TERM=xterm
# esxtop
Once you launch esxtop you will see a default screen (Figure 1), I included callout descriptions to some of the
main host statistics and fields. The esxtop output can show more information than you will need for the per-
formance or troubleshooting problem that you are addressing. There are also interactive commands that can be
issued to customize the display, which will be shown in Figure 3. Figure 1 is an example of the output generated
from esxtop or resxtop. There are several screens that can be viewed. The default screen is always the CPU
view as shown in the screen shot Figure 1, and the screen refreshes every five seconds by default. The esxtop
displays statistics based on worlds. A world can be defined as schedulable entity, and other operating systems
would call it a process. Each virtual machine will have multiple worlds running based on several factors. There
will be one world for each of the vCPUs running on the VM. There will be a world for the VM’s MKS, and a world
for the virtual machine monitor (VMM) of the world.
For example, if you want to switch from looking at the CPU view information to looking at the memory view,
simply type in the letter m to make the switch. Figure 2 shows the memory view.
Help Screen
To learn more about other options you can choose, type in h to get the help view for esxtop.
%Used = (Total CPU used time at the second snapshot – Total CPU used time at the first snapshot) / time
elapsed between snapshots
To help understand the esxtop output it helps to define fields and counters that you are viewing.
World – Is a schedulable entity
ID – World Identifier
GID – World Group Identifier
NWLD – Number of Worlds for an entity
CPU Load Average – is the mean of CPU loads in 1 minute, 5 minute, and 15 minutes, base on 6 second
samples.
CPU Statistics
PCPU USED% – CPU utilization per physical CPU (includes logical CPUs)
%USED – CPU Utilitzation. The percentage physical CPU time accounted to the world.
It is possible that the %USED of a world can be greater than 100%, if the system service runs on a different
PCPU for this world.
If the %USED of a VM is high, that means the VM is using lots of CPU resources, which can be normal.
%RDY – The percentage of time the world was ready to run, but was not provided the CPU resources. A world
in a run queue is waiting for the CPU scheduler to let it run on a PCPU. If %RDY of a VM is high, it means
the VM is possibly under resource contention. Check %MLMTD as well. If %MLMTD is high, you may raise
the CPU Limit setting for the VM. If %RDY - %MLMTD is high, the VM is under CPU contention.
%MLMTD – The percentage of time the world was ready to run but deliberately was not scheduled because
that would violate the CPU Limit setting. What does It mean if %MTMLD of a VM is high, the VM cannot
run because of the CPU limit setting.
%SYS – The percentage of time spent on the ESXi VMKernel running process interrupts and other system ser-
vices on behalf of the world.
%IDLE – The percentage of time the vCPU world is in an idle loop.
%CSTP – The percentage of time the vCPUs of a VM are spent in the co-stopped state, waiting to be co-started.
%RUN – The percentage of total scheduled time for the world to run. If %RUN of a VM is high, the VM is using
lots of CPU resources, but does not necessarily mean the VM is under resource constraint.
%WAIT – The percentage of time the world spent in the wait or idle state. This %WAIT is the total wait time,
the world is waiting for some VMKernel resource. The%WAIT time can be high because there are many
worlds waiting for events to happen, and the total wait time can be high dude to the large number of
worlds waiting on events.
Summary
The esxtop utility provides detailed performance data for an ESXi host. This real-time data gives the system
administrator information that aids in detecting performance issues. To better interpret esxtop data, it helps
to understand how to setup the esxtop view with the appropriate fields. When dealing with CPU performance
problems for a VM, one of the first fields to observe is %RDY. If this field is larger than 10%, it could mean that
you have more requests for CPU processing than resources available. Thus, %RDY time is the best indicator of
possible CPU performance issues.
Learn More
To learn more about how you can improve productivity, enhance efficiency, and sharpen your competitive edge,
Global Knowledge suggests the following courses:
VMware vSphere: Fast Track [V5.1]
VMware vSphere: Optimize and Scale [V5.1]