Enhancing The Monitoring Using Linux - 101112024111
Enhancing The Monitoring Using Linux - 101112024111
CPU
Memory
I/O
Network
Like Context switches, Run Queue, CPU utilization & Load Average
1
Course Contents
CPU
Like Context switches, Run Queue, CPU utilization & Load Average
2
Lab on SAR (System Activities Statistics)
3
Lab on tcpdump- Network Packet Analyzer
• For example: number of packets received (transmitted) through the network card,
statistics of packet failure etc
• lsof command used in many Linux/Unix like system that is used to display list of all the
open files and the processes.
4
On a very high level, following are the four subsystems that needs to
be monitored.
CPU
Memory
I/O
Network
5
CPU
You should understand the four critical performance metrics for CPU —
context switch, run queue, cpu utilization, and load average.
Context Switch
When CPU switches from one process (or thread) to another, it is called
as context switch.
However, a higher level of context switching can cause performance
issues.
6
CPU
Context Switch
Context Switch
You can view information about your process's context switches in /proc/<pid>/status.
$ pid=307
$ grep ctxt /proc/$pid/status
voluntary_ctxt_switches: 41
nonvoluntary_ctxt_switches: 16
8
CPU
Run Queue
Run queue indicates the total number of active processes in the current
queue for CPU.
When CPU is ready to execute a process, it picks it up from the run
queue based on the priority of the process.
Please note that processes that are in sleep state, or i/o wait state are
not in the run queue.
So, a higher number of processes in the run queue can cause
performance issues.
9
CPU
Cpu Utilization
10
CPU
Load Average
This indicates the average CPU load over a specific time period.
On Linux, load average is displayed for the last 1 minute, 5 minutes, and 15
minutes. This is helpful to see whether the overall load on the system is going up
or down.
For example, a load average of “0.75 1.70 2.10” indicates that the load on the
system is coming down. 0.75 is the load average in the last 1 minute. 1.70 is the
load average in the last 5 minutes. 2.10 is the load average in the last 15 minutes.
Please note that this load average is calculated by combining both the total
number of process in the queue, and the total number of processes in the
uninterruptable task status.
11
Memory
As you know, RAM is your physical memory. If you have 4GB RAM installed
on your system, you have 4GB of physical memory.
Virtual memory = Swap space available on the disk + Physical memory. The
virtual memory contains both user space and kernel space.
Using either 32-bit or 64-bit system makes a big difference in determining
how much memory a process can utilize.
On a 32-bit system a process can only access a maximum of 4GB virtual
memory. On a 64-bit system there is no such limitation.
12
Swap
Swap space in Linux is used when the amount of physical memory (RAM) is
full. If the system needs more memory resources and the RAM is full,
inactive pages in memory are moved to the swap space. While swap space
can help machines with a small amount of RAM, it should not be considered
a replacement for more RAM. Swap space is located on hard drives, which
have a slower access time than physical memory.
Swap space can be a dedicated swap partition (recommended), a swap file,
or a combination of swap partitions and swap files.
13
I/O
I/O wait is the amount of time CPU is waiting for I/O. If you see consistent
high i/o wait on you system, it indicates a problem in the disk subsystem.
You should also monitor reads/second, and writes/second. This is measured
in blocks. i.e number of blocks read/write per second. These are also
referred as bi and bo (block in and block out).
tps indicates total transactions per seconds, which is sum of rtps (read
transactions per second) and wtps (write transactions per seconds).
14
Network
For network interfaces, you should monitor total number of packets (and
bytes) received/sent through the interface, number of packets dropped, etc.
15
Commands to manage performance issues in Linux Servers
Listed below are some of commands including top, vmstat, iostat, free, and
sar. They may help in resolving performance issues quickly and easily.
16
Commands to manage performance issues in Linux Servers
Top
17
18
Commands to manage performance issues in Linux Servers
vmstat
The ‘vmstat’ command gives a snapshot of current CPU, IO, processes and
memory usage. Similar to the top command, it dynamically updates and can
be executed with this command:
$ vmstat 10
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free inact active si so bi bo in cs us sy id wa st
1 0 0 810420 97380 70628 0 0 115 4 89 79 1 6 90 3 0
19
Commands to manage performance issues in Linux Servers
sar
Use the ‘sar’ command line tool to collect, view and record performance data.
This command is considerably more sophisticated than all the commands
discussed above. It can collect and display data over longer periods.
20
Commands to manage performance issues in Linux Servers
iostat
21
Commands to manage performance issues in Linux Servers
Iostat
To identify whether I/O is causing system slowness you can use several commands but
the easiest is the unix command top.
22
23
Commands to manage performance issues in Linux Servers
free
The ‘free’ command shows memory statistics for both main memory and
swap. A total memory amount can be displayed by specifying the -t switch.
The amounts in bytes can also be displayed by specifying the -b switch and
megabytes using the -m switch (it displays in kilobytes by default).
Free can also be run continuously using the -s switch with a delay specified
in seconds:
$ free -s 5
24
Commands to manage performance issues in Linux Servers
free
25
Commands to manage performance issues in Linux Servers
26
Commands to manage performance issues in Linux Servers
Lsof command used in many Linux/Unix like system that is used to display
list of all the open files and the processes. The open files included are disk
files, network sockets, pipes, devices and processes. One of the main reason
for using this command is when a disk cannot be unmounted and displays
the error that files are being used or opened. With this commmand you can
easily identify which files are in use. The most common format for this
command is.
$ lsof
27
Commands to manage performance issues in Linux Servers
28
SAR (System Activities Statistics)
29
SAR (System Activities Statistics)
Using sar you can monitor performance of various Linux subsystems (CPU,
Memory, I/O, Network Statistics) in real time.
Using sar, you can also collect all performance data on an on-going basis,
store them, and do historical analysis to identify bottlenecks.
30
SAR (System Activities Statistics)
First, make sure the latest version of sar is available on your system. Install
it using any one of the following methods depending on your distribution.
31
SAR (System Activities Statistics)
Once installed, verify the sar version using “sar -V”. Version 10 is
the current stable version of sysstat.
$ sar -V
32
SAR (System Activities Statistics)
33
SAR (System Activities Statistics)
LinuxGuru@Server#sar -u 1 2
Linux 2.6.18-404.el5 04/09/17
34
SAR (System Activities Statistics)
sar -u Displays CPU usage for the current day that was collected until that point.
sar -u 1 3 Displays real time CPU usage every 1 second for 3 times.
sar -u ALL Same as “sar -u” but displays additional fields.
sar -u ALL 1 3 Same as “sar -u 1 3” but displays additional fields.
sar -u -f /var/log/sa/sa10 Displays CPU usage for the 10day of the month from the sa10 file.
35
SAR (System Activities Statistics)
If you have 4 Cores on the machine and would like to see what the
individual cores are doing, do the following.
“-P ALL” indicates that it should displays statistics for ALL the individual
Cores.
36
SAR (System Activities Statistics)
LinuxGuru@Server#sar -P ALL 1 1
Linux 2.6.18-404.el5 04/09/17
37
SAR (System Activities Statistics)
LinuxGuru@Server#sar -r 1 3
Linux 2.6.18-404.el5 04/09/17
38
SAR (System Activities Statistics)
sar -P ALL Displays CPU usage broken down by all cores for the current
day.
sar -P ALL 1 3 Displays real time CPU usage for ALL cores every 1 second
for 3 times (broken down by all cores).
sar -P 1 Displays CPU usage for core number 1 for the current day.
sar -P 1 1 3 Displays real time CPU usage for core number 1, every 1
second for 3 times.
sar -P ALL -f /var/log/sa/sa10 Displays CPU usage broken down by all cores
for the 10day day of the month from sa10 file.
39
SAR (System Activities Statistics)
40
SAR (System Activities Statistics)
sar -r
sar -r 1 3
sar -r -f /var/log/sa/sa10
41
SAR (System Activities Statistics)
tps – Transactions per second (this includes both read and write)
rtps – Read transactions per second
wtps – Write transactions per second
bread/s – Bytes read per second
bwrtn/s – Bytes written per second
42
SAR (System Activities Statistics)
LinuxGuru@Server#sar -b 1 3
Linux 2.6.18-404.el5 04/09/17
43
SAR (System Activities Statistics)
sar -b
sar -b 1 3
sar -b -f /var/log/sa/sa10
Note: Use “sar -v” to display number of inode handlers, file handlers,
and pseudo-terminals used by the system.
44
SAR (System Activities Statistics)
45
SAR (System Activities Statistics)
LinuxGuru@Server#sar -d 1 1
Linux 2.6.18-404.el5 04/09/17
10:41:07 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
10:41:08 dev8-0 2.00 0.00 176.00 88.00 0.00 1.00 1.00 0.20
10:41:08 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:41:08 dev8-2 2.00 0.00 176.00 88.00 0.00 1.00 1.00 0.20
10:41:08 dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:41:08 dev8-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.20
46
SAR (System Activities Statistics)
47
SAR (System Activities Statistics)
The device name (DEV column) can display the actual device name (for example: sda, sda1, sdb1 etc.,), if you
use the -p option (pretty print) as shown below.
LinuxGuru@Server#sar -p -d 1 1
Linux 2.6.18-404.el5 04/09/17
10:42:18 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
10:42:19 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:42:19 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
48
SAR (System Activities Statistics)
sar -d
sar -d 1 3
sar -d -f /var/log/sa/sa10
sar -p -d
49
SAR (System Activities Statistics)
This reports the run queue size and load average of last 1 minute, 5
minutes, and 15 minutes. “1 3” reports for every 1 seconds a total
of 3 times.
50
Linux Performance Monitoring and Tuning
Introduction
LinuxGuru@Server#sar -q 1 3
Linux 2.6.18-404.el5 04/09/17
51
Linux Performance Monitoring and Tuning
Introduction
Note: The “blocked” column displays the number of tasks that are currently
blocked and waiting for I/O operation to complete.
sar -q
sar -q 1 3
sar -q -f /var/log/sa/sa10
52
Linux Performance Monitoring and Tuning
Introduction
53
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
KEYWORD can be one of the following:
DEV – Displays network devices vital statistics for eth0, eth1, etc.,
EDEV – Display network device failure statistics
NFS – Displays NFS client activities
NFSD – Displays NFS server activities
SOCK – Displays sockets in use for IPv4
IP – Displays IPv4 network traffic
EIP – Displays IPv4 network errors
ICMP – Displays ICMPv4 network traffic
TCP – Displays TCPv4 network traffic
ETCP – Displays TCPv4 network errors
UDP – Displays UDPv4 network traffic
SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
ALL – This displays all of the above information. The output will be very long.
54
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
$ sar -n DEV 1 1
LinuxGuru@Server#sar -n DEV 1 1
Linux 2.6.18-404.el5 04/09/17
55
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
When you view historic sar data from the /var/log/sa/saXX file using “sar -
f” option, it displays all the sar data for that specific day starting from
12:00 a.m for that day.
Using “-s hh:mi:ss” option, you can specify the start time. For example, if
you specify “sar -s 10:00:00”, it will display the sar data starting from
10 a.m (instead of starting from midnight) as shown below.
56
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
For example, to report the load average on 26th of this month starting
from 10 a.m in the morning, combine the -q and -s option as shown
below.
57
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
CPU Utilization:
# sar -f /var/log/sa/sa11 -u 2 -s 06:30:00 -e 07:30:00
Linux 2.6.32-431.20.3.el6.s390x 11/10/16 _s390x_ (1 CPU)
58
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
59
Linux Performance Monitoring and Tuning
Introduction
SAR (System Activities Statistics)
06:40:01 vg_root-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
06:50:01 vg_root-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
07:00:01 vg_root-lv_swap 0.12 0.75 0.25 8.00 0.00 15.33 2.40 0.03
07:10:01 vg_root-lv_swap 0.12 0.21 0.72 8.00 0.00 17.29 6.57 0.08
07:20:01 vg_root-lv_swap 0.00 0.00 0.01 8.00 0.00 10.00 10.00 0.00
Average: vg_root-lv_swap 0.05 0.19 0.20 8.00 0.00 16.23 4.45 0.00
disk await is high during the same period of time and the disk is swap disk. It is trying to access the swap disk but unable
to get it. So the swap utilization is normal but unable to get the swap disk to swapin swap out.
60
Tcpdump
61
Tcpdump
# tcpdump -i eth0
# tcpdump -c 5 -i eth0
62
Tcpdump
# tcpdump -D
1.eth0
2.eth1
63
Tcpdump
To read and analyze captured packet 0001.pcap file use the command with -r
option, as shown below.
# tcpdump -r 0001.pcap
64
Tcpdump
# tcpdump -n -i eth0
Let’s say you want to capture packets for specific port 22, execute the below
command by specifying port number 22 as shown below.
66
Tcpdump
67
lsof
68
High CPU Utilization
• Below are commands which can be used to find out biggest cpu
consuming processes
• top
• ps –eo pmem,pcpu,pid,args | tail –n +2|sort –rnk 1|head
69
High Memory Utilization
• top
• ps –eo pmem,pcpu,pid,args | tail –n +2|sort –rnk 2|head
70
Swap
72
How to Increase Swap in Linux
73
END of this Course Module.
Thanks
74