AIX Performance Tuning
AIX Performance Tuning
Jaqui Lynch
lynchj@forsythe.com
AGENDA
CPU
Memory tuning
Network
Starter Set of Tunables
Volume groups and filesystems
AIO and CIO for Oracle
Backup Slides
Performance Tools
4/1/2013
CPU
MONITORING CPU
User, system, wait and idle are fine for dedicated LPARs
They are not fine for SPLPAR or dedicated donating
LPARs
You need to measure and charge back based on used
CPU cycles
Moral of the story use Physc (Physical consumed)
lparstat
Use with no flags to view partition configuration and processor
usage
4/1/2013
LOGICAL PROCESSORS
Logical Processors represent SMT threads
LPAR 1
LPAR 2
LPAR 1
LPAR 1
SMT on
SMT off
SMT on
SMT off
vmstat - Icpu=4
lcpu=2
Icpu=4
Icpu=2
L
V
L
V
L
V
2 Cores
Dedicated
Core
Logical
V=0.6
V=0.6
2 Cores
Dedicated
Core
V=0.4
PU=1.2
Weight=128
Hypervisor
Core
Core
Core
Virtual
PU=0.8
Weight=192
Core
Core
V=0.4
Physical
Core
Core
Core
UNDERSTAND SMT
SMT
Threads dispatch via a Virtual Processor (VP)
SMT1: Largest unit of execution work
SMT2: Smaller unit of work, but provides greater
amount of execution work per cycle
SMT4: Smallest unit of work, but provides the
maximum amount of execution work per cycle
On POWER7, a single thread cannot exceed 65%
utilization
On POWER6 or POWER5, a single thread can
consume 100%
Understand thread dispatch order
SMT Thread
Primary
0
1
Secondary
Tertiary
4/1/2013
Or throughput?
Volume over time or capacity
How many concurrent things can I push through
Affected by pipelining and SMT
Architect accordingly
Check for gating factors that could impact use of SMT
i.e. is there one thread that controls all work?
7
7
4/1/2013
MORE ON DISPATCHING
4 milliseconds
left, divided
between still
running VMs
according to
weight factor
and run again
10
4/1/2013
SCALED THROUGHPUT
P7 and P7+ with AIX v6.1 TL08 and AIX v7.1 TL02
Dispatches more SMT threads to a VP core before unfolding additional VPs
Tries to make it behave a bit more like P6
Raw provides the highest per-thread throughput
and best response times at the expense of activating more physical core
Scaled provides the highest core throughput at the expense of per-thread response
times and throughput.
It also provides the highest system-wide throughput per VP because tertiary thread
capacity is not left on the table.
schedo p o vpm_throughput_mode=
0 Legacy Raw mode (default)
1 Enhanced Raw mode with a higher threshold than legacy
2 Scaled mode, use primary and secondary SMT threads
4 Scaled mode, use all four SMT threads
Dynamic Tunable
4/1/2013
lsdev Cc processor
lsattr EL proc0
bindprocessor q
sar P ALL
topas, nmon
lparstat
vmstat (use I or -v)
iostat
mpstat s
lssrad -av
13
physc %entc
0.01
1.4
0.00
0.3
0.00
0.3
0.00
0.3
0.00
0.0
0.98 97.5
0.02
2.5
In the above cpu4-6 are missing as they are 0 so sar did not print them to save space
mpstat s 1 1
System configuration: lcpu=8 ent=1.0 mode=Uncapped
Proc0
2.26%
cpu0 cpu1
cpu2 cpu3
1.33% 0.31% 0.31% 0.31%
Proc4
0.01%
cpu4 cpu5 cpu6 cpu7
0.00% 0.00% 0.00% 0.01%
14
4/1/2013
%entc
3.3
46.8
46.9
3.1
100
cpu3
2.19%
MEMORY
16
4/1/2013
MEMORY TYPES
Persistent
Backed by filesystems
Working storage
Dynamic
Includes executables and their work areas
Backed by page space
17
CORRECTING PAGING
11173706 paging space I/Os blocked with no psbuf
lsps output on above system that was paging before changes were made to tunables
lsps -a
Page Space
Physical Volume Volume Group Size
%Used Active Auto Type
paging01
hdisk3
pagingvg 16384MB 25 yes
yes
lv
paging00
hdisk2
pagingvg 16384MB 25 yes
yes
lv
hd6
hdisk0
rootvg
16384MB 25 yes
yes
lv
What you want to see
lsps -a
Page Space
Physical Volume Volume Group Size
%Used
paging01
hdisk3
pagingvg 16384MB 1
paging00
hdisk2
pagingvg 16384MB 1
hd6
hdisk0
rootvg
16384MB 1
lsps -s
Total Paging Space Percent Used
49152MB
1%
Active
yes
yes
yes
Auto Type
yes
lv
yes
lv
yes
lv
Should be balanced NOTE VIO Server comes with 2 different sized page datasets on one hdisk
(at least until FP24)
18
4/1/2013
VIO Server
1 x 512MB and 1 x 1024MB page space both on the same disk
Supposedly fixed if installing FP24 but not if upgrade
On my VIO:
# lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type Chksum
hd6
hdisk0
rootvg
4096MB 1
yes
yes lv
0
19
20
10
4/1/2013
minperm=3
Always try to steal from filesystems if filesystems are using more
than 3% of memory
maxperm=90
Soft cap on the amount of memory that filesystems or network can
use
Superset so includes things covered in maxclient as well
maxclient=90
Hard cap on amount of memory that JFS2 or NFS can use
SUBSET of maxperm
21
PAGE_STEAL_METHOD
22
11
4/1/2013
PAGE_STEAL_METHOD
EXAMPLE
500GB memory
50% used by file systems (250GB)
50% used by working storage (250GB)
mempools = 5
So we have at least 5 LRUDs each controlling about 100GB memory
Set to 0
Scans all 100GB of memory in each pool
Set to 1
Scans only the 50GB in each pool used by filesystems
23
LSSRAD -AV
Smaller LPAR
REF1 SRAD
0
0
2
1
1
3
MEM
88859.50
36354.00
42330.00
20418.00
CPU
0-15 20-23 28-31 36-39 44-47 52-55 6016-19 24-27 32-35 40-43 48-51 56-59
REF1 indicates where
REF1=0 SRAD=0 is local
REF1=0 SRAD=1 is near
CPU
Other REF values are far
This is relative to the process home
0-7
SRAD = CPU + Memory group
MEM = Mbytes
CPU = LCPU number, assuming SMT4
8-11
24
12
4/1/2013
For AIX v6 or v7
Memory defaults are already correctly except minfree and maxfree
If you upgrade from a previous version of AIX using migration then you need to
check the settings after
25
ioo -p -o j2_maxPageReadAhead=128
(default above may need to be changed for sequential)
j2_dynamicBufferPreallocation=16
Default that may need tuning
Replaces tuning j2_nBufferPerPagerDevice
Network changes in later slide
26
13
4/1/2013
3 types
These are also pinned
Filesystem
Client
External Pager
JFS
NFS and VxFS
JFS2
LVMO
A OUTPUT
this is rootvg
14
4/1/2013
VMSTAT
V OUTPUT
pbufs
pagespace
JFS
NFS/VxFS
JFS2
numclient=numperm so most likely the I/O being done is JFS2 or NFS or VxFS
Based on the blocked I/Os it is clearly a system using JFS2
It is also having paging problems
pbufs also need reviewing
29
30
15
4/1/2013
31
NETWORK
32
16
4/1/2013
33
IFCONFIG
ifconfig -a output
en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAS
T,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 10.2.0.37 netmask 0xfffffe00 broadcast 10.2.1.255
tcp_sendspace 65536 tcp_recvspace 65536 rfc1323 0
lo0:
flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROU
PRT,64BIT>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
These override no, so they will need to be set at the adapter.
Additionally you will want to ensure you set the adapter to the correct setting if it runs at
less than GB, rather than allowing auto-negotiate
Stop inetd and use chdev to reset adapter (i.e. en0)
Or use chdev with the P and the changes will come in at the next reboot
34
17
4/1/2013
35
NETWORK
Interface
Speed
MTU
tcp_sendspace
tcp_recvspace
rfc1323
lo0
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Virtual Ethernet
InfiniBand
N/A
10/100 mb
1000 (Gb)
1000 (Gb)
1000 (Gb)
1000 (Gb)
N/A
N/A
16896
131072
131072
1500
9000
1500
9000
any
2044
131072
262144
262144
262144
262144
131072
165536
131072
262144
262144
262144
131072
1
1
1
1
1
1
36
18
4/1/2013
NETWORK COMMANDS
Compare to bandwidth (For 1Gbit - 948 Mb/s if simplex and 1470 if duplex )
OTHER NETWORK
If 10Gb network check out Gareths Webinar
https://www.ibm.com/developerworks/wikis/download/attachments/153124943/7_PowerVM_10Gbit_Et
hernet.pdf?version=1
netstat v
To see setting
mbuf errors
lparstat 2
Look for high vcsw indicator that entitlement may be too low
Disabled by default
19
4/1/2013
ENTSTAT -V
ETHERNET STATISTICS (ent18) :
Device Type: Shared Ethernet Adapter
Elapsed Time: 44 days 4 hours 21 minutes 3 seconds
Transmit Statistics:
Receive Statistics:
-------------------------------------Packets: 94747296468
Packets: 94747124969
Bytes: 99551035538979
Bytes: 99550991883196
Interrupts: 0
Interrupts: 22738616174
Transmit Errors: 0
Receive Errors: 0
Packets Dropped: 0
Packets Dropped: 286155
Bad Packets: 0
Max Packets on S/W Transmit Queue: 712
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 50
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
Broadcast Packets: 3227715
Broadcast Packets: 3221586
Multicast Packets: 3394222
Multicast Packets: 3903090
No Carrier Sense: 0
CRC Errors: 0
DMA Underrun: 0
DMA Overrun: 0
Lost CTS Errors: 0
Alignment Errors: 0
No Resource Errors: 286155 check those tiny, etc Buffers
Max Collision Errors: 0
Late Collision Errors: 0
Receive Collision Errors: 0
Deferred: 0
Packet Too Short Errors: 0
SQE Test: 0
Packet Too Long Errors: 0
Timeout Errors: 0
Packets Discarded by Adapter: 0
Single Collision Count: 0
Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 50
39
ENTSTAT V
VIO
SEA
Transmit Statistics:
-------------------Packets: 83329901816
Bytes: 87482716994025
Interrupts: 0
Transmit Errors: 0
Packets Dropped: 0
Receive Statistics:
------------------Packets: 83491933633
Bytes: 87620268594031
Interrupts: 18848013287
Receive Errors: 0
Packets Dropped: 67836309
Bad Packets: 0
Max Packets on S/W Transmit Queue: 374
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
Broadcast Packets: 1077222
Broadcast Packets: 1075746
Multicast Packets: 3194318
Multicast Packets: 3194313
No Carrier Sense: 0
CRC Errors: 0
DMA Underrun: 0
DMA Overrun: 0
Lost CTS Errors: 0
Alignment Errors: 0
Max Collision Errors: 0
No Resource Errors: 67836309
Virtual I/O Ethernet Adapter (l-lan) Specific Statistics:
--------------------------------------------------------Hypervisor Send Failures: 4043136
No Resource Errors can occur when the appropriate amount
Receiver Failures: 4043136
Send Errors: 0
of memory can not be added quickly to vent buffer space for a
Hypervisor Receive Failures: 67836309
workload situation.
40
20
4/1/2013
BUFFERS
Virtual Trunk Statistics
Receive Information
Receive Buffers
Buffer Type
Min Buffers
Max Buffers
Allocated
Registered
History
Max Allocated
Lowest Registered
Tiny
512
2048
513
511
Small
512
2048
2042
506
Medium
128
256
128
128
Large
24
64
24
24
Huge
24
64
24
24
532
502
2048
354
128
128
24
24
24
24
41
42
21
4/1/2013
BASICS
Data layout will have more impact than most tunables
Plan in advance
Large hdisks are evil
I/O performance is about bandwidth and reduced queuing, not size
10 x 50gb or 5 x 100gb hdisk are better than 1 x 500gb
Also larger LUN sizes may mean larger PP sizes which is not great for lots of little
filesystems
Need to separate different kinds of data i.e. logs versus data
Default client qdepth for NPIV is set by the Multipath driver in the client
43
From: PE23 Disk I/O Tuning in AIX v6.1 Dan Braden and Steven Nasypany, October 2010
44
22
4/1/2013
SAR -D
sar d 2 6 shows:
device
avque
%busy
hdisk7 0
hdisk8 19
hdisk9 2
avque
0.0
0.3
0.0
r+w/s
2
568
31
0.0
23.5
0.0
1.9
2.3
0.9
avwait
Time waiting in the wait queue (ms)
avserv
I/O service time when sent to disk (ms)
46
23
4/1/2013
IOSTAT
-D
xfer: %tm_act
bps
tps
bread
bwrtn
8.8
3.4M 148.4 409.9
3.4M
read:
rps avgserv minserv maxserv timeouts
fails
0.0
0.2
0.2
0.2
0
0
write:
wps avgserv minserv maxserv timeouts
fails
148.3
3.6
0.2
632.2
0
0
queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull
24.5
0.0
631.7
4.0
0.0
83.2
tps
avgserv
Avgtime
avgwqsz
INTERACTIVE NMON D
48
24
4/1/2013
adjust max_xfer_size
adjust num_cmd_elems
- adjust num_cmd_elems
If using NPIV make changes to VIO and client, not just VIO
49
ADAPTER TUNING
fcs0
bus_intr_lvl
bus_io_addr
bus_mem_addr
init_link
intr_priority
lg_term_dma
max_xfer_size
num_cmd_elems
pref_alpa
sw_fc_class
115
0xdfc00
0xe8040000
al
3
0x800000
0x100000
200
0x1
2
25
4/1/2013
VIO SERVER
#lsattr -El fcs0
lg_term_dma
max_xfer_size
num_cmd_elems
adapter True
0x800000
0x200000
1024
51
PARAMETER
NETWORK (no)
rfc1323
tcp_sendspace
tcp_recvspace
udp_sendspace
udp_recvspace
DEFAULTS
AIXv5.3
AIXv6
0
16384
16384
9216
42080
MEMORY (vmo)
minperm%
maxperm%
maxclient%
lru_file_repage
lru_poll_interval
Minfree
Maxfree
page_steal_method
JFS2 (ioo)
j2_maxPageReadAhead
j2_dynamicBufferPreallocation
20
80
80
1
?
960
1088
0
128
16
NEW
SET ALL TO
AIXv7
0
16384
16384
9216
42080
0
16384
16384
9216
42080
3
90
90
0
10
960
1088
0 /1 (TL)
128
16
3
90
90
0
10
960
1088
1
128
16
1
262144 (1Gb)
262144 (1Gb)
65536
655360
3
90
JFS, NFS, VxFS, JFS2
90
JFS2, NFS
0
10
calculation
calculation
1
as needed
as needed
52
26
4/1/2013
noatime
Why write a record every time you read or touch a file?
mount command option
Use for redo and archive logs
53
54
27
4/1/2013
I/O BANDWIDTH
PCIe2 LP 8Gb 4 port Fibre HBA
Data throughput
3200 MB/ps FDX per port
IOPS
200,000 per port
http://www.redbooks.ibm.com/technotes/tips0883.pdf
Can run at 2Gb, 4Gb or 8Gb
IOPS
ORACLE
Asynchronous I/O and
Concurrent I/O
56
28
4/1/2013
True
True
AIO is used to improve performance for I/O to raw LVs as well as filesystems.
57
IOSTAT
-A
iostat -A async IO
System configuration: lcpu=16 drives=15
aio: avgc avfc maxg maif maxr avg-cpu: % user % sys % idle % iowait
150
Disks:
hdisk6
hdisk5
hdisk9
5652
% tm_act
23.4
15.2
13.9
0 12288
Kbps
1846.1
1387.4
1695.9
21.4
tps
Kb_read
3.3
64.7
10.6
Kb_wrtn
58
29
4/1/2013
0
0
0
0
##Restricted tunables
aio_fastpath = 1
aio_fsfastpath = 1
aio_kprocprio = 39
aio_multitidsusp = 1
aio_sample_rate = 5
aio_samples_per_cycle = 6
posix_aio_fastpath = 1
posix_aio_fsfastpath = 1
posix_aio_kprocprio = 39
posix_aio_sample_rate = 5
posix_aio_samples_per_cycle = 6
1 aioPpool
1 aioLpool
59
AIO RECOMMENDATIONS
Oracle now recommending the following as starting points
minservers
maxservers
maxreqs
5.3
100
200
16384
3 - default
200
65536 default
30
4/1/2013
Performance will suffer due to requirement for 128kb I/O (after 4MB)
61
When to use it
Database DBF files, redo logs and control files and flashback log files.
Not for Oracle binaries or archive log files
62
31
4/1/2013
I give each instance its own filesystem and their redo logs are also separate
63
If not (i.e. 9i) then you will have to set the filesystem to use CIO in the /etc
filesystems
options
=
cio
(/etc/filesystems)
disk_async_io
=
true
(init.ora)
Do not put anything in the filesystem that the Database does not
manage remember there is no inode lock on writes
Or you can use ASM and let it manage all the disk automatically
Also read Metalink Notes #257338.1, #360287.1
See Metalink Note 960055.1 for recommendations
Do not set it in both places (config file and /etc/filesystems)
64
32
4/1/2013
65
USEFUL LINKS
AIX Wiki
https://www.ibm.com/developerworks/wikis/display/WikiPtype/AIX
HMC Scanner
http://www.ibm.com/developerworks/wikis/display/WikiPtype/HMC+Scanner
Workload Estimator
http://ibm.com/systems/support/tools/estimator
Performance Monitoring
https://www.ibm.com/developerworks/wikis/display/WikiPtype/Performance+Monitoring+Doc
umentation
VIOS Advisor
https://www.ibm.com/developerworks/wikis/display/WikiPtype/Other+Performance+Tools#Ot
herPerformanceTools-VIOSPA
66
33
4/1/2013
REFERENCES
Simultaneous Multi-Threading on POWER7 Processors by Mark Funk
http://www.ibm.com/systems/resources/pwrsysperf_SMT4OnP7.pdf
Rosa Davidson Back to Basics Part 1 and 2 Jan 24 and 31, 2013
https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Power%20Systems/page/AIX%20Virtual%20User%20Group%2
0-%20USA
https://www.ibm.com/developerworks/mydeveloperworks/wikis/home?lang=en#/wiki/Power%20Systems/page/PowerVM%20technical%20webinar%
20series%20on%20Power%20Systems%20Virtualization%20from%20IBM%20web
http://www.redbooks.ibm.com/redbooks/pdfs/sg247940.pdf
http://www.redbooks.ibm.com/redbooks/pdfs/sg247590.pdf
http://www.redbooks.ibm.com/redbooks/pdfs/sg248080.pdf
http://www.redbooks.ibm.com/redbooks/pdfs/sg248079.pdf
Redbook Tip on Maximizing the Value of P7 and P7+ through Tuning and Optimization
http://www.redbooks.ibm.com/technotes/tips0956.pdf
67
34
4/1/2013
69
BACKUP SLIDES
PERFORMANCE TOOLS
70
35
4/1/2013
TOOLS
ioo, vmo, schedo, vmstat v
lvmo
lparstat, mpstat
iostat
Check out Alphaworks for the Graphical LPAR
tool
Ganglia - http://ganglia.info
Nmonrrd and nmon2web and pGraph
Commercial IBM
PM for AIX
Performance Toolbox
Tivoli ITM
topas
nmon
nmon analyzer
Windows tool so need to copy the .nmon
file over in ascii mode
Opens as an excel spreadsheet and then
analyses the data
Also look at nmon consolidator
sar
sar -A -o filename 2 30 >/dev/null
Creates a snapshot to a file in this case
30 snaps 2 seconds apart
Must be post processed on same level of
system
errpt
Check for changes from defaults
https://www.ibm.com/developerworks/wikis/display/WikiPtype/Other+Performance+Tools
71
OTHER TOOLS
filemon
filemon -v -o filename -O all
sleep 30
trcstop
72
36
4/1/2013
NMON
NMON
37
4/1/2013
NMON ON
FOR
V12
75
PERFORMANCE WIKI
76
38
4/1/2013
VIOS ADVISOR
https://www.ibm.com/developerworks/wikis/display/WikiPtype/VIOS+Advisor
77
VIOS ADVISOR
78
39
4/1/2013
VIOS ADVISOR
79
VIOS ADVISOR
80
40
4/1/2013
VIOS ADVISOR
81
VIOS ADVISOR
82
41
4/1/2013
VIOS ADVISOR
83
VIOS ADVISOR
84
42