Skip to content

Commit f9351d4

Browse files
authored
Merge branch 'main' into Add-Latest-Features-For-Autograd-Tutorial
2 parents 4feed23 + 2c4c99d commit f9351d4

File tree

1 file changed

+35
-37
lines changed

1 file changed

+35
-37
lines changed

recipes_source/recipes/profiler_recipe.py

Lines changed: 35 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -105,22 +105,24 @@
105105

106106
######################################################################
107107
# The output will look like (omitting some columns):
108-
109-
# --------------------------------- ------------ ------------ ------------ ------------
110-
# Name Self CPU CPU total CPU time avg # of Calls
111-
# --------------------------------- ------------ ------------ ------------ ------------
112-
# model_inference 5.509ms 57.503ms 57.503ms 1
113-
# aten::conv2d 231.000us 31.931ms 1.597ms 20
114-
# aten::convolution 250.000us 31.700ms 1.585ms 20
115-
# aten::_convolution 336.000us 31.450ms 1.573ms 20
116-
# aten::mkldnn_convolution 30.838ms 31.114ms 1.556ms 20
117-
# aten::batch_norm 211.000us 14.693ms 734.650us 20
118-
# aten::_batch_norm_impl_index 319.000us 14.482ms 724.100us 20
119-
# aten::native_batch_norm 9.229ms 14.109ms 705.450us 20
120-
# aten::mean 332.000us 2.631ms 125.286us 21
121-
# aten::select 1.668ms 2.292ms 8.988us 255
122-
# --------------------------------- ------------ ------------ ------------ ------------
123-
# Self CPU time total: 57.549m
108+
#
109+
# .. code-block:: sh
110+
#
111+
# --------------------------------- ------------ ------------ ------------ ------------
112+
# Name Self CPU CPU total CPU time avg # of Calls
113+
# --------------------------------- ------------ ------------ ------------ ------------
114+
# model_inference 5.509ms 57.503ms 57.503ms 1
115+
# aten::conv2d 231.000us 31.931ms 1.597ms 20
116+
# aten::convolution 250.000us 31.700ms 1.585ms 20
117+
# aten::_convolution 336.000us 31.450ms 1.573ms 20
118+
# aten::mkldnn_convolution 30.838ms 31.114ms 1.556ms 20
119+
# aten::batch_norm 211.000us 14.693ms 734.650us 20
120+
# aten::_batch_norm_impl_index 319.000us 14.482ms 724.100us 20
121+
# aten::native_batch_norm 9.229ms 14.109ms 705.450us 20
122+
# aten::mean 332.000us 2.631ms 125.286us 21
123+
# aten::select 1.668ms 2.292ms 8.988us 255
124+
# --------------------------------- ------------ ------------ ------------ ------------
125+
# Self CPU time total: 57.549m
124126
#
125127

126128
######################################################################
@@ -209,8 +211,6 @@
209211
# Self CPU time total: 23.015ms
210212
# Self CUDA time total: 11.666ms
211213
#
212-
######################################################################
213-
214214

215215
######################################################################
216216
# (Note: the first use of XPU profiling may bring an extra overhead.)
@@ -220,28 +220,26 @@
220220
#
221221
# .. code-block:: sh
222222
#
223-
#------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
224-
# Name Self XPU Self XPU % XPU total XPU time avg # of Calls
225-
# ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
226-
# model_inference 0.000us 0.00% 2.567ms 2.567ms 1
227-
# aten::conv2d 0.000us 0.00% 1.871ms 93.560us 20
228-
# aten::convolution 0.000us 0.00% 1.871ms 93.560us 20
229-
# aten::_convolution 0.000us 0.00% 1.871ms 93.560us 20
230-
# aten::convolution_overrideable 1.871ms 72.89% 1.871ms 93.560us 20
231-
# gen_conv 1.484ms 57.82% 1.484ms 74.216us 20
232-
# aten::batch_norm 0.000us 0.00% 432.640us 21.632us 20
233-
# aten::_batch_norm_impl_index 0.000us 0.00% 432.640us 21.632us 20
234-
# aten::native_batch_norm 432.640us 16.85% 432.640us 21.632us 20
235-
# conv_reorder 386.880us 15.07% 386.880us 6.448us 60
236-
# ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
237-
# Self CPU time total: 712.486ms
238-
# Self XPU time total: 2.567ms
239-
223+
# ------------------------------ ------------ ------------ ------------ ------------ ------------
224+
# Name Self XPU Self XPU % XPU total XPU time avg # of Calls
225+
# ------------------------------ ------------ ------------ ------------ ------------ ------------
226+
# model_inference 0.000us 0.00% 2.567ms 2.567ms 1
227+
# aten::conv2d 0.000us 0.00% 1.871ms 93.560us 20
228+
# aten::convolution 0.000us 0.00% 1.871ms 93.560us 20
229+
# aten::_convolution 0.000us 0.00% 1.871ms 93.560us 20
230+
# aten::convolution_overrideable 1.871ms 72.89% 1.871ms 93.560us 20
231+
# gen_conv 1.484ms 57.82% 1.484ms 74.216us 20
232+
# aten::batch_norm 0.000us 0.00% 432.640us 21.632us 20
233+
# aten::_batch_norm_impl_index 0.000us 0.00% 432.640us 21.632us 20
234+
# aten::native_batch_norm 432.640us 16.85% 432.640us 21.632us 20
235+
# conv_reorder 386.880us 15.07% 386.880us 6.448us 60
236+
# ------------------------------ ------------ ------------ ------------ ------------ ------------
237+
# Self CPU time total: 712.486ms
238+
# Self XPU time total: 2.567ms
240239
#
241240

242-
243241
######################################################################
244-
# Note the occurrence of on-device kernels in the output (e.g. ``sgemm_32x32x32_NN``).
242+
# Note the occurrence of on-device kernels in the output (e.g. ``sgemm_32x32x32_NN`` for CUDA or ``gen_conv`` for XPU).
245243

246244
######################################################################
247245
# 4. Using profiler to analyze memory consumption

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy