Skip to content

Commit c416958

Browse files
committed
Expand section on profilers (perf and VTune)
1 parent d054d21 commit c416958

File tree

2 files changed

+214
-25
lines changed

2 files changed

+214
-25
lines changed

talk/tools/profiling.tex

Lines changed: 214 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4,39 +4,228 @@
44
\frametitle{Profiling}
55
\begin{block}{Conceptually}
66
\begin{itemize}
7-
\item take a measurement of a performance aspect of a program
7+
\item Take a measurement of a performance aspect of a program
88
\begin{itemize}
9-
\item where in my code is most of the time spent?
10-
\item is my program compute or memory bound?
11-
\item does my program make good use of the cache?
12-
\item is my program using all cores most of the time?
13-
\item how often are threads blocked and why?
14-
\item which API calls are made and in which order?
9+
\item Where in my code is most of the time spent?
10+
\item Is my program compute or memory bound?
11+
\item Does my program make good use of the cache?
12+
\item Is my program using all cores most of the time?
13+
\item How often are threads blocked and why?
14+
\item Which API calls are made and in which order?
1515
\item ...
1616
\end{itemize}
17-
\item the goal is to find performance bottlenecks
18-
\item is usually done on a compiled program, not on source code
17+
\item The goal is to find performance bottlenecks
18+
\item Usually done on a compiled program, not on source code
1919
\end{itemize}
2020
\end{block}
2121
\end{frame}
2222

2323
\begin{frame}[fragile]
24-
\frametitle{perf, VTune and uProf}
25-
\begin{block}{perf}
24+
\frametitle{\mintinline{bash}{perf} -- Performance analysis tools for Linux}
25+
\setlength{\leftmargini}{0pt}
2626
\begin{itemize}
27-
\item perf is a powerful command line profiling tool for linux
28-
\item compile with \mintinline{bash}{-g -fno-omit-frame-pointer}
29-
\item \mintinline{bash}{perf stat -d <prg>} gathers performance statistics while running \mintinline{bash}{<prg>}
30-
\item \mintinline{bash}{perf record -g <prg>} starts profiling \mintinline{bash}{<prg>}
31-
\item \mintinline{bash}{perf report} displays a report from the last profile
32-
\item More information in \href{https://perf.wiki.kernel.org/index.php/Main_Page}{this wiki}, \href{https://www.brendangregg.com/linuxperf.html}{this website} or \href{https://indico.cern.ch/event/980497/contributions/4130271/attachments/2161581/3647235/linux-systems-performance.pdf}{this talk}.
27+
\item Powerful command line profiling tool for Linux
28+
\item Not portable, the source code is part of the Linux kernel itself
29+
\item Much lower overhead compared with \mintinline{bash}{valgrind}
30+
\item In order to profile your code, make sure to compile with
31+
\texttt{CXXFLAGS="-O2 -g -fno-omit-frame-pointer"}
32+
\item Counting and sampling
33+
\begin{itemize}
34+
\item Counting -- count occurrences of a given event (e.g.\ cache misses)
35+
\item Time-based sampling -- sample the stack at regular time intervals
36+
\item Event-based sampling -- take samples when event counter overflows
37+
\item Instruction-based sampling -- sample instructions and precisely count events they create
38+
\end{itemize}
39+
\item Static and dynamic tracing
40+
\begin{itemize}
41+
\item Static -- pre-defined tracepoints in software (e.g.\ scheduling events)
42+
\item Dynamic -- tracepoints created dynamically with \mintinline{bash}{perf probe}
43+
\end{itemize}
3344
\end{itemize}
34-
\end{block}
35-
\begin{block}{Intel VTune and AMD uProf}
36-
\begin{itemize}
37-
\item Graphical profilers from CPU vendors with rich features
38-
\item Needs vendor's CPU for full experience
39-
\item More information on \href{https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html}{Intel's website} and \href{https://developer.amd.com/amd-uprof/}{AMD's website}
40-
\end{itemize}
41-
\end{block}
45+
\end{frame}
46+
47+
\begin{frame}[fragile]
48+
\frametitle{\mintinline{bash}{perf} commands}
49+
{ \scriptsize
50+
\begin{block}{}
51+
\begin{minted}{shell-session}
52+
$ perf
53+
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
54+
The most commonly used perf commands are:
55+
annotate Read perf.data and display annotated code
56+
c2c Shared Data C2C/HITM Analyzer.
57+
config Get and set variables in a configuration file.
58+
diff Read perf.data and display the differential profile
59+
evlist List the event names in a perf.data file
60+
list List all symbolic event types
61+
mem Profile memory accesses
62+
record Run a command and record its profile into perf.data
63+
report Read perf.data and display the profile
64+
sched Tool to trace/measure scheduler properties (latencies)
65+
script Read perf.data and display trace output
66+
stat Run command and gather performance counter statistics
67+
top System profiling tool.
68+
version display the version of perf binary
69+
probe Define new dynamic tracepoints
70+
trace strace inspired tool
71+
See 'perf help COMMAND' for more information on a specific command.
72+
\end{minted}
73+
\end{block}
74+
}
75+
\end{frame}
76+
77+
\begin{frame}[fragile]
78+
\frametitle{Listing events with \mintinline{bash}{perf list}}
79+
{ \scriptsize
80+
\begin{block}{}
81+
\begin{minted}{shell-session}
82+
$ # List main hardware events
83+
$ perf list hw
84+
85+
List of pre-defined events (to be used in -e):
86+
87+
branch-instructions OR branches [Hardware event]
88+
branch-misses [Hardware event]
89+
cache-misses [Hardware event]
90+
cache-references [Hardware event]
91+
cpu-cycles OR cycles [Hardware event]
92+
instructions [Hardware event]
93+
94+
$ # List main software/cache events
95+
$ perf list sw
96+
$ perf list cache
97+
98+
$ # List all pre-defined metrics
99+
$ perf list metric
100+
101+
$ # List all currently known events:
102+
$ perf list
103+
\end{minted}
104+
\end{block}
105+
}
106+
\end{frame}
107+
108+
\begin{frame}[fragile]
109+
\frametitle{Counting events with \mintinline{bash}{perf stat}}
110+
{ \scriptsize
111+
\begin{block}{}
112+
\begin{minted}{shell-session}
113+
$ # Standard CPU counter statistics for the specified command:
114+
$ perf stat <command>
115+
116+
$ # Detailed CPU counter statistics for the specified command:
117+
$ perf stat -d <command>
118+
$ perf stat -dd <command>
119+
120+
$ # Top-down microarchitecture analysis for the entire system, for 10s:
121+
$ perf stat -a --topdown -- sleep 10
122+
123+
$ # L1 cache hit rate reported every 1000 ms for the specified command:
124+
$ perf stat -e L1-dcache-loads,L1-dcache-load-misses -I 1000 <command>
125+
126+
$ # Instruction per cycle and Instruction-level parallelism, for command:
127+
$ perf stat -M IPC,ILP -- <command>
128+
129+
$ # Measure GFLOPs system-wide, until Ctrl-C is used to stop:
130+
$ perf stat -M GFLOPs
131+
132+
$ # Measure cycles and instructions 10 times, report results with stddev:
133+
$ perf stat -e cycles,instructions -r 10 -- <command>
134+
\end{minted}
135+
\end{block}
136+
}
137+
\end{frame}
138+
139+
140+
\begin{frame}[fragile]
141+
\frametitle{Recording profiling information with \mintinline{bash}{perf record}}
142+
{ \scriptsize
143+
\begin{block}{}
144+
\begin{minted}{shell-session}
145+
$ # Sample on-CPU functions for the specified command, at 100 Hertz:
146+
$ perf record -F 100 -- <command>
147+
148+
$ # Sample CPU stack traces (via frame pointers), at 100 Hertz, for 10s:
149+
$ perf record -F 100 -g -- sleep 10
150+
151+
$ # Sample stack traces for PID using DWARF to unwind stacks, for 10s:
152+
$ perf record -p <PID> --call-graph=dwarf -- sleep 10
153+
154+
$ # Precise on-CPU user stack traces (no skid) using PEBS (Intel CPUs):
155+
$ perf record -g -e cycles:up -- <command>
156+
157+
$ # Sample CPU stack traces using Instruction-based sampling (AMD CPUs):
158+
$ # (Note that you need to use system-wide sampling for IBS on AMD CPUs)
159+
$ perf record -a -g -e cycles:pp -- <command>
160+
161+
$ # Sample CPU stack traces once every 10k L1 data cache misses, for 5s:
162+
$ perf record -a -g -e L1-dcache-load-misses -c 10000 -- sleep 5
163+
164+
$ # Sample CPUs at 100 Hertz, and show top addresses and symbols, live:
165+
$ perf top -F 100
166+
\end{minted}
167+
\end{block}
168+
}
169+
\end{frame}
170+
171+
\begin{frame}[fragile]
172+
\frametitle{Reporting and annotating source code with \mintinline{bash}{perf}}
173+
{ \scriptsize
174+
\begin{block}{}
175+
\begin{minted}{shell-session}
176+
$ # Standard reporting of perf.data in text UI interface:
177+
$ perf report
178+
179+
$ # Report by self-time (excluding time spent in callees):
180+
$ perf report --no-children
181+
182+
$ # Report per source line of code (needs debugging info to work):
183+
$ perf report --no-children -s srcline
184+
185+
$ # Single inverted (caller-based) call-graph per binary:
186+
$ perf report --inverted -s comm
187+
188+
$ # Text-based report per library, without call graph:
189+
$ perf report --stdio -g none -s dso
190+
191+
$ # Hierarchical report for functions taking at least 1% of runtime:
192+
$ perf report --stdio -g none --hierarchy --percent-limit 1
193+
194+
$ # Disassemble and annotate a symbol (instructions with percentages):
195+
$ # (Needs debugging information available to show source code as well)
196+
$ perf annotate <symbol>
197+
\end{minted}
198+
\end{block}
199+
}
200+
\end{frame}
201+
202+
\begin{frame}[fragile]
203+
\frametitle{Further information on \mintinline{bash}{perf}}
204+
\begin{itemize}
205+
\item Official documentation in the Linux repository at
206+
\href{https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation}
207+
{linux/tools/perf/Documentation}
208+
\item Perf Wiki at \url{https://perf.wiki.kernel.org/}
209+
\item Linux \mintinline{bash}{perf} examples by Brendan Gregg
210+
\url{https://www.brendangregg.com/linuxperf.html}
211+
\item Scripts to visualize profiles as flamegraphs
212+
\url{https://github.com/brendangregg/FlameGraph}
213+
\item HSF Tools \& Packaging Working Group talk on Indico\\
214+
\href{https://indico.cern.ch/event/974382/}
215+
{Linux Systems Performance: Tracing, Profiling \& Visualization}
216+
\end{itemize}
217+
\end{frame}
218+
219+
\begin{frame}[fragile]
220+
\frametitle{Intel VTune Profiler}
221+
\centering
222+
\includegraphics[width=0.75\textwidth]{tools/vtune.png}
223+
\begin{itemize}
224+
\item Very powerful GUI-based profiler for Intel CPUs and GPUs
225+
\item Now free to use with
226+
\href{https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html}{Intel oneAPI Base Toolkit} or
227+
\href{https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html}{standalone}
228+
\item See the \href{https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/}
229+
{official online documentation} for more information
230+
\end{itemize}
42231
\end{frame}

talk/tools/vtune.png

165 KB
Loading

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy