Latency Overhead of ROS2 For Modular Time-Critical
Latency Overhead of ROS2 For Modular Time-Critical
Time-Critical Systems
Tobias Kronauer∗ , Joshwa Pohlmann∗ , Maximilian Matthé∗ , Till Smejkal† and Gerhard Fettweis∗
∗ Barkhausen Institute, Dresden, Germany, firstname.lastname@barkhauseninstitut.org
† Operating Systems Group, TU Dresden, Dresden, Germany, till.smejkal@tu-dresden.de
2
B. Parameter Space Publish / Subscriber Sequence
to the publisher frequency. The data size of the sensor reading Publisher::publish RCLCPP_INTERPROCESS_PUBLISH
is variable as well. In the ROS system, the payload of the rcl_publish RCL_PUBLISH
message represents this quantity. The number of nodes, which Publisher ROS2 Common
rmw_publish RMW_PUBLISH
Publisher rmw
includes the start and end node in a data-processing pipeline dds_write* DDS_WRITE
can be modified as well. Moreover, we can change the Quality to network DDS
of Service settings. The aforementioned parameters remain Subscriber
loop [forever]
Executor::wait_for_work
C. Measurement Metrics
rcl_wait
Latency is our main measurement metric. In accordance with
rmw_wait
the previous papers mentioned in Sec. II, the latency is defined dds_wait**
between publishing a message and receiving the message, i.e. from network
want to profile the call stack between publishing and subscriber take_type_erased
RCLCPP_TAKE_ENTER
callback for evaluating the overhead of ROS2 core and the
rcl_take RCL_TAKE_ENTER
middleware interfaces. As statistical quantity, we choose median Subscriber rmw
Subscriber ROS2 Common
take_with_info RMW_TAKE_ENTER
as it is also used by [16] and resilient to outliers. DDS
dds_take*** DDS_TAKE_ENTER
handle_message
performance_test benchmarks only ROS2 Dashing Di- RCLCPP_HANDLE
3
median latency [us] FastRTPS CycloneDDS Connext
1,000 1,000 30,000
750 750 20,000
500 500
10,000
250 250
0 0 0
1 20 40 60 80 100 1 20 40 60 80 100 1 20 40 60 80 100
Frequency [Hz] Frequency [Hz] Frequency [Hz]
Fig. 4: Investigating the influence of publisher frequency on latency with three nodes. Evaluation is performed on the desktop
PC, QoS reliability is set to BEST_EFFORT. Note the different scaling of the a y-axis for Connext.
4
FastRTPS, 100 B CyloneDDS, 100 B Connext, 100 B
Median Latency [us]
3,000 3,000 10,000
2,000 2,000
5,000
1,000 1,000
0 0 0
3 7 11 15 19 23 3 7 11 15 19 23 3 7 11 15 19 23
FastRTPS, 500 KB CyloneDDS, 500 KB Connext, 500 KB
Median Latency [us]
10,000 10,000
5 · 105
5,000 5,000
0 0 0
3 7 11 15 19 23 3 7 11 15 19 23 3 7 11 15 19 23
Nodes Nodes Nodes
1 Hz 40 Hz 80 Hz 100 Hz
Fig. 5: Investigation of the scalability of a node system. We used a payload of 100 B (upper row) and of 500 KB (lower row).
The DDS middleware was varied. For visualization purposes, only a few frequencies were picked. Evaluation was performed
on the desktop PC with QoS-reliability BEST_EFFORT. Note the different y-axis scaling for Connext.
3,000 3,000
6,000
2,000 2,000
4,000
1,000 1,000
2,000
0 0 0
3 7 11 15 19 23 3 7 11 15 19 23 3 7 11 15 19 23
FastRTPS, 500 KB CycloneDDS, 500 KB Connext, 500 KB
Median Latency [us]
0 0 0
3 7 11 15 19 23 3 7 11 15 19 23 3 7 11 15 19 23
Raspberry Pi, FastRTPS, 100 B Raspberry Pi, CycloneDDS, 100 B
Median Latency [us]
Fig. 6: Categorization of intra-process profiling of different latencies. Evaluation was performed on the desktop PC with QoS
reliability BEST_EFFORT. Note the different scaling of the y-axis for Connext. The frequency is 100 Hz.
Therefore, higher latencies were already expected prior to the We can observe the same pattern as in Fig. 4: the latency
evaluation. We were told that significant improvements are to decreases if the frequency is increased. This effect is intensified
be expected for the upcoming Connext rmw release, which is with the length of the data-processing pipeline. For a payload
in early release testing. of 100 B, the results indicate a linear relationship between
the number of nodes and the median latency for FastRTPS
B. Evaluation of Scalability and CycloneDDS. The outliers may occur due to too few
samples, which is the case for lower frequencies. It needs to
We evaluate the median latency from starting to end node be further investigated if the relationship will be more linear if
for the data-process pipeline ranging from 3 to 23 nodes. As the number of samples is larger. Similar results were observed
payload, we use 100 B and 500 KB. For visualization purposes, for the Raspberry Pi. However, the graph for CycloneDDS is
we restrict ourselves to a subset of the evaluated frequencies. more linear. Reasons can be a simpler hardware, which is not
Results are shown in Fig. 5. as adaptively controlled to load as the desktop PC.
5
In the case of Connext, a nonlinear relationship for smaller Relative deviation of median between BEST_EFFORT and RELIABLE
frequencies is visible. Furthermore, Connext only yields results 20
until 15 nodes, as for higher node numbers the Connext rmw 15
raises exceptions. This is independent of the parameter set.
For a payload of 100 B, the largest amount of latency can Fig. 7: Evaluation of influence of QoS reliability settings on
be attributed to the categories DDS and Rclcpp Notification median latency. For FastRTPS and CycloneDDS, we choose
Delay as seen in Fig. 6. Especially for Connext, the largest 500 KB and 100 Hz. In the case of Connext, we have a payload
portion of the latency can be attributed to the DDS middleware of 500 KB and a publisher frequency of 40 Hz.
itself. However, the overhead of ROS2 compared to raw DDS
amounts up to 50 % for small messages.
For a payload of 500 KB, one can clearly see that for all CycloneDDS yields the lowest latency.
•
middlewares, the major part of the median latency is due to The DDS middleware and the delay between message
•
the DDS middleware. In the case of FastRTPS, the latency notification and message retrieval by ROS2 contribute the
seems to increase nonlinearly. In addition, we can observe that biggest portions to the overall latency.
the median latency entailed by Rclcpp Notification Delay is • The Connext rmw is highly suboptimal. In later releases,
higher than in the case of 100 B, i.e it is payload-dependent. this will most likely change.
The erratic behavior of Connext can be mainly attributed to • Latency is larger on Raspberry Pi, however the qualitative
the DDS middleware, but also to the categories Subscriber results are the same. Fluctuation in latency is less
rmw and Publisher rmw. compared to the desktop PC.
As we focus on possible overhead reductions of ROS2 in this During our evaluation, we observe that latency highly
paper, we assume that the DDS middlewares cannot be changed. depends on energy saving features of the OS and the hardware.
Thus, we can observe that major performance improvements Our main focus was on the CPU. However, energy saving
can be obtained for the category Rclcpp Notification Delay. features of the NIC, e.g., might play an important role. This
Similar results could be obtained for the Raspberry Pi with a needs to be taken carefully into consideration if latency is to
higher latency, cf. Fig. 6. be mitigated for real-time critical applications.
For the evaluation of network-independent ROS2 overhead,
D. Influence of QoS Reliability we created the nodes in one process on the same machine.
In the last sections, the QoS reliability policy was set to This use case is unrealistic as Intra-Process Communication
BEST_EFFORT and kept as a constant parameter. Because of would normally be used as this approach is much more efficient.
the use of localhost as network device, the network is not As we use separate executors per node, there should not be
lossy, i.e. the influence of the policy will not be immediately much difference between creating the nodes in one process as
visible. Therefore, we pick the highest possible throughput and opposed to creating nodes in separate processes. However, as
calculate the relative deviation between the median latencies pointed out by [27], the node to participant mapping is highly
obtained with the QoS policy BEST_EFFORT and RELIABLE. suboptimal in the Connext rmw. This might be the reason for
As can be seen in Fig. 7, no trend is actually visible, i.e. we the bad performance. Additionally, the Connext rmw is highly
cannot simulate a lossy network with the available parameter suboptimal in general as thoroughly explained in Sec. IV-A.
sets. Similar results were obtained for the Raspberry Pi. This will be fixed in future releases as discussed with RTI.
In future work, one might consider an evaluation in a
V. C ONCLUSION AND F UTURE O UTLOOK
more realistic setup, i.e. with distributed systems. Network
The goal of this paper was to provide the reader with simple effects could be better evaluated. An effect of the QoS
guidelines if ROS2 is used for time-critical systems. Given our reliability possibility should be observable. Aside from the
parameter set, we discovered the following rules of thumb: median, one could evaluate other statistical quantities or verify
• With a payload higher than the fragmentation size of UDP if the messages follow a certain distribution. The obtained
(here, 64 KB), latency increases with the payload size. information could then be used as an additional uncertainty
• The higher the frequency, the lower the latency. for state estimation and incorporated into the Kalman Filter.
6
R EFERENCES [21] J. Kim, J. M. Smereka, C. Cheung, S. Nepal, and M. Grobler,
“Security and Performance Considerations in ROS 2: A Balancing
Act,” arXiv:1809.09566 [cs], Sep. 2018. [Online]. Available: http:
[1] OSRF, Community Metrics Report, 2019 (accessed August 17, //arxiv.org/abs/1809.09566
2020). [Online]. Available: http://download.ros.org/downloads/metrics/ [22] iRobot, ros2-performance, 2020 (accessed October 10, 2020). [Online].
metrics-report-2019-07.pdf Available: https://github.com/irobot-ros/ros2-performance
[2] ——, ROS Robots, 2020 (accessed August 17, 2020). [Online]. [23] ApexAI, performance test, 2020 (accessed October 10, 2020). [Online].
Available: https://robots.ros.org/ Available: https://gitlab.com/ApexAI/performance test/
[3] ——, Project Governance, 2020 (accessed August 17, 2020). [Online]. [24] O. Robotics, Intra-process Communications in ROS 2, 2020 (accessed
Available: https://index.ros.org/doc/ros2/Governance/#governance August 17, 2020). [Online]. Available: http://design.ros2.org/articles/
[4] D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg, “Response- intraprocess communications.html
Time Analysis of ROS 2 Processing Chains Under Reservation-Based [25] Barkhausen-Institut, Benchmarking, 2020 (accessed October 10, 2020).
Scheduling,” in 31st Euromicro Conference on Real-Time Systems [Online]. Available: https://github.com/Barkhausen-Institut/projects
(ECRTS 2019), ser. Leibniz International Proceedings in Informatics [26] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated
(LIPIcs), S. Quinton, Ed., vol. 133. Dagstuhl, Germany: Schloss performance comparison of virtual machines and Linux containers,”
Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019, pp. 6:1–6:23. [Online]. in 2015 IEEE International Symposium on Performance Analysis of
Available: http://drops.dagstuhl.de/opus/volltexte/2019/10743 Systems and Software (ISPASS), 2015, pp. 171–172.
[5] LGSVL, LGSVL Simulator, 2020 (accessed August 17, 2020). [Online]. [27] OSRF, Node to Participant mapping, 2020 (accessed August 17, 2020).
Available: https://www.lgsvlsimulator.com/ [Online]. Available: http://design.ros2.org/articles/Node to Participant
[6] gazebo ros2 control, 2020 (accessed August 17, 2020). [Online]. mapping.html
Available: https://github.com/gazebo ros2 control
[7] OMG, Data Distribution Service, 2015 (accessed August 17, 2020).
[Online]. Available: https://www.omg.org/spec/DDS/
[8] OSRF, Why ROS 2?, 2020 (accessed August 17, 2020). [Online].
Available: https://design.ros2.org/articles/why ros2.html
[9] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler,
and A. Y. Ng, “ROS: an open-source Robot Operating System,” in ICRA
workshop on open source software, vol. 3, no. 3.2. Kobe, Japan, 2009,
p. 5.
[10] OSRF, ROS on DDS, 2019 (accessed August 17, 2020). [Online].
Available: https://design.ros2.org/articles/ros on dds.html
[11] ——, About different ROS 2 DDS/RTPS vendors, 2020 (accessed
August 17, 2020). [Online]. Available: https://index.ros.org/doc/ros2/
Concepts/DDS-and-ROS-middleware-implementations/
[12] M. Naumann, F. Poggenhans, M. Lauer, and C. Stiller, “CoInCar-Sim: An
Open-Source Simulation Framework for Cooperatively Interacting Auto-
mobiles,” in 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp.
1–6.
[13] M. A. Lema, A. Laya, T. Mahmoodi, M. Cuevas, J. Sachs, J. Markendahl,
and M. Dohler, “Business case and technology analysis for 5G low latency
applications,” IEEE Access, vol. 5, pp. 5917–5935, 2017.
[14] S. Maheshwari, D. Raychaudhuri, I. Seskar, and F. Bronzino,
“Scalability and Performance Evaluation of Edge Cloud
Systems for Latency Constrained Applications,” in
2018 IEEE/ACM Symposium on Edge Computing (SEC), 2018,
pp. 286–299.
[15] F. Voigtländer, A. Ramadan, J. Eichinger, C. Lenz, D. Pensky, and
A. Knoll, “5G for Robotics: Ultra-Low Latency Control of Distributed
Robotic Systems,” in 2017 International Symposium on Computer
Science and Intelligent Controls (ISCSIC), 2017, pp. 69–72.
[16] Y. Maruyama, S. Kato, and T. Azumi, “Exploring the performance
of ROS2,” in Proceedings of the 13th International Conference on
Embedded Software - EMSOFT ’16. Pittsburgh, Pennsylvania: ACM
Press, 2016, pp. 1–10.
[17] M. Reke, D. Peter, J. Schulte-Tigges, S. Schiffer, A. Ferrein, T. Walter,
and D. Matheis, “A Self-Driving Car Architecture in ROS2,” in 2020
International SAUPEC/RobMech/PRASA Conference, Jan. 2020, pp. 1–
6.
[18] C. S. V. Gutiérrez, L. U. S. Juan, I. Z. Ugarte, and V. M. Vilches,
“Towards a distributed and real-time framework for robots: Evaluation
of ROS 2.0 communications for real-time robotic applications,”
arXiv:1809.02595 [cs], Sep. 2018, arXiv: 1809.02595. [Online].
Available: http://arxiv.org/abs/1809.02595
[19] R. Morita and K. Matsubara, “Dynamic Binding a Proper DDS Im-
plementation for Optimizing Inter-Node Communication in ROS2,” in
2018 IEEE 24th International Conference on Embedded and Real-Time
Computing Systems and Applications (RTCSA), Aug. 2018, pp. 246–
247, iSSN: 2325-1301.
[20] Y.-P. Wang, W. Tan, X.-Q. Hu, D. Manocha, and S.-M. Hu, “TZC:
Efficient Inter-Process Communication for Robotics Middleware with
Partial Serialization,” in 2019 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), Nov. 2019, pp. 7805–7812, iSSN:
2153-0866.