0% found this document useful (0 votes)
25 views15 pages

ROS2 Real-Time Performance Optimization and Evaluation: Original Article Open Access

ROS2 Real-time Performance Optimization and Evaluation

Uploaded by

qemih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

ROS2 Real-Time Performance Optimization and Evaluation: Original Article Open Access

ROS2 Real-time Performance Optimization and Evaluation

Uploaded by

qemih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Ye et al.

Chinese Journal of Mechanical Engineering


Chinese Journal of Mechanical Engineering (2023) 36:144
https://doi.org/10.1186/s10033-023-00976-5

ORIGINAL ARTICLE Open Access

ROS2 Real‑time Performance Optimization


and Evaluation
Yanlei Ye1, Zhenguo Nie1,2, Xinjun Liu1,2*, Fugui Xie1,2, Zihao Li1 and Peng Li1

Abstract
Real-time interaction with uncertain and dynamic environments is essential for robotic systems to achieve functions
such as visual perception, force interaction, spatial obstacle avoidance, and motion planning. To ensure the reliability
and determinism of system execution, a flexible real-time control system architecture and interaction algorithm are
required. The ROS framework was designed to improve the reusability of robotic software development by provid-
ing a distributed structure, hardware abstraction, message-passing mechanism, and application prototypes. Rich
ecosystems for robotic development have been built around ROS1 and ROS2 architectures based on the Linux
system. However, because of the fairness scheduling principle of the default Linux system design and the complexity
of the kernel, the system does not have real-time computing. To achieve a balance between real-time and non-real-
time computing, this paper uses the transmission mechanism of ROS2, combines it with the scheduling mechanism
of the Linux operating system, and uses Preempt_RT to enhance the real-time computing of ROS1 and ROS2. The
real-time performance evaluation of ROS1 and ROS2 is conducted from multiple perspectives, including throughput,
transmission mode, QoS service quality, frequency, number of subscription nodes and EtherCAT master. This paper
makes two significant contributions: firstly, it employs Preempt_RT to optimize the native ROS2 system, effectively
enhancing the real-time performance of native ROS2 message transmission; secondly, it conducts a comprehensive
evaluation of the real-time performance of both native and optimized ROS2 systems. This comparison elucidates
the benefits of the optimized ROS2 architecture regarding real-time performance, with results vividly demonstrated
through illustrative figures.
Keywords ROS, Real-time system optimization, Preempt_RT, Real-time performance evaluation of ROS2

1 Introduction the real-time performance of ROS2 is crucial, as it deter-


Developing a ROS2 control system requires careful atten- mines the system’s usability for researchers and engineers
tion to real-time performance design and assurance. and how to better utilize ROS2 [2] for related research.
Industrial robots, aerospace equipment, medical robots, Numerous software concepts and architectures have
service robots, and military robots all impose strict real- been proposed in response to the difficulties of devel-
time constraints. A real-time system is one that responds oping software for complex robot systems. In recent
to events occurring in the environment within precise years, component-based and model-driven development
timing intervals [1]. Hence, optimizing and evaluating have gradually been introduced into the construction
of robot software systems to simplify development and
*Correspondence: improve quality. Modern robot control systems are typi-
Xinjun Liu cally designed as component-based distributed systems.
xinjunliu@mail.tsinghua.edu.cn Examples of well-known systems that use this approach
1
State Key Laboratory of Tribology in Advanced Equipment, Department
of Mechanical Engineering, Tsinghua University, Beijing 100084, China include OROCOS [3], OpenHRP [4], YARP [5, 6], MRDS
2
Beijing Key Lab of Precision/Ultra‑Precision Manufacturing Equipments [7], Director [8] and ROS [9–13]. They all share the
and Control, Tsinghua University, Beijing 100084, China idea that complex robot systems should be composed

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 2 of 15

of software engineering interaction modules based on


components.
The robot operating system (ROS) has become popular
among researchers and engineers due to its streamlined,
message-based, and tool-based design. However, its non-
real-time system architecture prevents it from guarantee-
ing fault tolerance, deadlines, or process synchronization.
Karamousadakis et al. [14] designed a quadruped robot
based on the ROS1 system architecture using Xeno-
mai patches to optimize the native system. Despite this
improvement, ROS still requires significant resources,
including CPU, memory, network bandwidth, threads,
and kernels. It cannot manage these resources to meet
time constraints effectively.
Figure 1 Real-time extension methods based on the Linux kernel
The real-time robot operating system (RT-ROS) [15]
creates a non-real-time/real-time task execution environ-
ment using the Linux and Nuttx kernels. This improves
the real-time performance of ROS, but it does not guar- by the operating system itself. Commonly, ROS2 is built
antee real-time constraints for ROS. Using RT-ROS on Ubuntu, which cannot guarantee the real-time per-
requires modifications to the ROS library and nodes, formance of the system (such as a robot communication
making it difficult to quickly update and maintain. cycle of 1ms with jitter below 200 µs). When the robot’s
MICRO-ROS [16] is a variant developed specifically trajectory is finely interpolated and the system cannot
for resource-limited microcontrollers, which is a light- deliver data on time, the robot’s joint motion becomes
weight ROS client that can run on modern 32-bit micro- less smooth. Therefore, it is urgent to carry out real-time
controllers like STM32. However, deploying projects on performance analysis under the ROS2 architecture and
microprocessors for dual-arm robots or large engineer- improve the real-time performance of the system.
ing projects is challenging due to limited resources and Currently, several popular commercial real-time sys-
computing power. tems include QNX Neutrino, ENEA OSE, Integrity,
As the demand for translating research results into VxWorks, and Windows CE [23–26]. In addition, many
commercial products becomes urgent, the limitations of open-source real-time systems, including CHAOS,
ROS1 as a fundamental research platform are becom- MARS, Spring, ARTS, RK, TIMIX, MARUTI, HARTOS,
ing apparent, as it was not designed with the needs of YARTOS, HARTIK, Erika Enterprise, Shark, Marte OS,
real-time systems, small embedded platforms, non-ideal RTLinux, and FreeRTOS, are commonly used to handle
networks, cross platform compatibility, and commer- real-time tasks for single-core and single-task scenarios
cial productization in mind. ROS2, which uses the data [1, 27, 28]. However, their capabilities for handling multi-
distribution service (DDS) [17, 18] for communica- core tasks and compatibility with non-real-time applica-
tion, can improve the real-time performance of message tions are weaker.
passing [19, 20], but this improvement is only targeted Linux is a popular choice among researchers and busi-
at the latency between nodes (usually considered to be nesses due to its open-source nature, stability, reliabil-
several hundred milliseconds). Ding et al. [21] system- ity, fast-update environment, and large community. To
atically introduced the architecture of the ROS2 system leverage the powerful Linux ecosystem, which includes
and were among the first to analyze the source code of drivers, desktop and human-computer interaction inter-
ROS2. Maruyama et al. [12] have explored the impor- faces, and to ensure compatibility with the ROS archi-
tant real-time performance of ROS2 on the native kernel, tecture, modifications to the Linux kernel are required
evaluating the real-time performance of ROS2 relative to to achieve real-time performance. Two approaches are
ROS1 from multiple perspectives. Choi [22] proposed a typically available: the dual-kernel approach (also known
priority-driven chain-aware scheduler to optimize the as PICO-KERNEL, NANO-KERNEL, DUAL KERNEL)
real-time performance of ROS2 from a scheduling strat- and the real-time patch approach, as shown in Figure 1.
egy perspective, improving end-to-end latency. ROS2.0 The dual-kernel approach includes Xenomai [29, 30] and
itself is built on DDS and some modules to construct RTAI [1, 31], while the real-time patch approach includes
distributed and real-time solutions. However, most of Preempt_RT [32] (Linux Real-time Patch, Linux Configu-
the ROS2 ecosystem is currently built around Linux, and ration). To maintain a flexible architecture design and
the upper limit of real-time performance is determined minimize changes to the original system code, this article
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 3 of 15

Table 1 Software components of the controller


Item Description Version

Ubuntu Linux distribution 22.04


Linux kernel Linux kernel 5.15.55
Preempt_RT Linux kernel patch 5.15.55-rt48
ROS1 first-generation robot operating system Noetic
ROS2 Second-generation robot operating Humble
system
EtherCAT master Industrial Ethernet Fieldbus acontis

and the Preempt_RT patch applied. The ROS1 version


used is Noetic, while the ROS2 version is Humble, which
Figure 2 PC-based control platform
is the latest LTS version supported for the last 5 years.

3 Real‑time Optimization of ROS2 Based


utilizes the Preempt_RT patch approach to optimize the on Preempt_RT
real-time performance of the ROS2 architecture. The optimization of the real-time performance of the
This article presents a comprehensive evaluation of the ROS2 system centers on enhancing the real-time capa-
real-time performance of ROS1 and ROS2 data trans- bilities of the operating system kernel. In this work, we
mission on a Preempt_RT optimized real-time system, first studied the Xenomai dual-kernel solution. The basic
which outperforms the native system. The real-time per- principle of this approach is to run a microkernel and
formance of ROS1 and ROS2 is compared from multiple a native Linux kernel simultaneously. Real-time tasks
perspectives, including throughput, control frequency, are executed on the microkernel, which takes control of
and multi-node subscription. Section 2 introduces the interrupts and directly manages them at the lowest level.
software and hardware operating environment of the sys- When no real-time tasks are running on the microker-
tem, while Section 3 explains the real-time performance nel, the Linux kernel can be given an opportunity to run.
optimization based on Preempt_RT. Section 4 conducts a Xenomai achieves real-time capabilities by running the
rigorous evaluation of the real-time performance. Finally, real-time Cobalt kernel in parallel with the Linux ker-
a summary of the results are presented in the last section. nel, as illustrated in Figure 3. However, we opted for the
This study provides valuable insights for improving the Preempt_RT patch approach to optimize the real-time
real-time performance of ROS2 systems. performance of the ROS2 architecture, due to its flexible
architecture design and minimization of changes to the
2 System Setup original system code.
The TH-Dual-Arm robot, developed by the Advanced The Cobalt microkernel manages critical timing activi-
Mechanism and Roboticized Equipment Lab at Tsinghua ties, such as interrupt handling and scheduling of real-
University, was utilized as the subject of this study. The time threads. The Cobalt kernel has a higher priority than
control hardware architecture was implemented based the native kernel, and the key to enhancing real-time per-
on a PC, as depicted in Figure 2. When designing the formance lies in the Adaptive Domain Environment for
controller hardware, the requirements for system com-
puting power and storage, as well as the need for plat-
form scalability, universality, and standardization, were Table 2 Controller hardware system configuration
taken into account. Table 1 shows some of the software Item Description Quantity
used, while Table 2 lists the hardware. The system utilizes
Motherboard Mini ITX motherboard SD103- 1
the EtherCAT bus communication protocol. It should be H110 by Taiwanese manufac-
noted that this paper does not analyze the motion perfor- turer DFI
mance of the control system but only conducts real-time CPU Intel i7 7700 4 cores 3.6 GHz 1
performance optimization and evaluation under this Solid State Drive 256 G 1
configuration. RAM DDR4-3200 32 GB 2
The relevant components and software configurations Network Interface Controller Intel I211(1 Gbit/s) 2
are shown in Table 1 and Table 2. The Linux system used Network Interface Controller Intel I219(1 Gbit/s) 2
is Ubuntu 22.04, with a Linux kernel version of 5.15.55 Power Supply DC 24V 1
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 4 of 15

Figure 4 Architecture of Xenomai and Preempt_RT

Table 3 Kernel optimization


Item Description

Preemption model Fully Preemptible Kernel (Real-Time)


Figure 3 Xenomai Cobalt kernel architecture
Timers’ subsystem High-Resolution Timer Support
Timer tick handling Full dynticks system (tickless)
Timer frequency 1000 Hz
Default CPUFreq governor Performance
Operating Systems (ADEOS). ADEOS enables the shar-
C-state Forbid
ing of common hardware resources among multiple iden-
tical or different kernels on the same system. In ADEOS,
the Interrupt Pipeline (I-PIPE) manages and distributes
interrupts between Linux and Xenomai, passing them Ubuntu 22.04 on an Intel x86_64 system with kernel
in domain priority order. For registered interrupts in the version 5.15.55-generic, applied the Preempt_RT patch
real-time kernel, direct processing is ensured immedi- (patch-5.15.55-rt48.patch.gz), and optimized Table 3.
ately after their generation, guaranteeing the real-time Some visual modules were trimmed.
performance of the system. For interrupts generated by To achieve high accuracy timing in the nanosecond
Linux, they are recorded first and then processed only range, the clock_gettime(CLOCK_MONOTONIC,
after the real-time task yields the CPU. &ts_now) function can be utilized. For timed latency
To optimize the real-time performance of the native requiring precise timing, the clock_nanosleep(CLOCK_
Linux kernel and fully utilize the rich software of the MONOTONIC, TIMER_ABSTIME, &ts_nest, NULL)
Ubuntu system, this paper uses the Preempt_RT patch. function is recommended.
Preempt_RT optimizes the native macro kernel by For scheduling policies in publish-subscribe, client-
minimizing the code of non-preemptible kernels and server, and action-client-action-server designs, we use
the number of code changes implemented to achieve the CFS scheduler for non-critical nodes in this paper.
preemption. In particular, the critical section, interrupt The CFS scheduler implements scheduling using a red-
handler, and interrupt disable code sequence are modi- black tree to adjust running times based on time slices
fied to make this section preemptible. The Preempt_RT and virtual time, as shown in Eqs. (1) and (2):
patch fully utilizes the Symmetrical Multi-Processing
(SMP) function of the Linux kernel to add this additional weight_i
ime_slice_I = sched_period × , (1)
preemption without rewriting the kernel, as shown in weight_pq ′
Figure 4.
The Preempt_RT patch provides functions such as weight_nice0
preemptible critical sections, preemptible interrupt han- vruntime_i = vruntime_i + × real_runtime.
weight_i
dlers, preemptible "interrupt disable" code sequences, (2)
kernel spinlocks, and semaphore priority inheritance,
For real-time nodes in the controller design, we use the
as well as measures to reduce latency. Modifications to
SCHED_FIFO scheduling policy of the RT scheduler for
the native kernel include high-precision timers, thread
control. The SCHED_FIFO scheduling policy schedules
interrupt handlers, sleep spinlocks, real-time mutexes,
system tasks using a multi-level priority queue. Among
and RCU synchronization mechanisms. To evaluate
tasks with the same priority, the real-time task based on
the performance of the Preempt_RT patch, we installed
SCHED_FIFO will execute until completion, relinquish
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 5 of 15

Figure 5 Node data transfer diagram of ROS2

control voluntarily, or be preempted by a task with a


higher priority.

4 Real‑time Performance Evaluation of ROS2


This study aims to ensure the stability of the control sys-
tem architecture during operation by maintaining real-
time performance across different frequencies and loads.
The analysis focuses on the jitter and latency caused by
various factors, including frequency, data size, Quality of
Service (QoS), and Data Distribution Service (DDS), in Figure 6 Measured latency time
both native systems and ROS1 and ROS2 systems opti-
mized using Preempt_RT. Specifically, we investigate the
latency characteristics of ROS1 and ROS2 and attempt
latency. It can measure system latency caused by hard-
to identify differences in their performance. The study
ware, firmware, and the operating system, and is com-
explores the end-to-end latency of individual nodes as
monly used to test the latency of kernel usage to assess
well as the subscription latency of multiple nodes.
real-time kernel performance. The latency measured by
Nodes can exchange data through topics, services, and
Cyclictest refers to interrupt and scheduling delays, as
actions, as depicted in Figure 5. Each of these commu-
shown in Figure 6, where interrupt delay refers to the
nication methods has its own message structure, which
latency between the occurrence of an interrupt and the
can be nested to enable the exchange of complex data
start of the interrupt service routine (ISR), and schedul-
between nodes. Moreover, each node can perform mul-
ing latency refers to the time it takes for a task to obtain
tiple roles, and subscribers can be asynchronously awak-
actual CPU usage after being awakened.
ened to perform computations. Actions are commonly
To test the real-time performance of the kernel, mul-
used in controller design for real-time feedback and exe-
tiple real-time threads with specified priorities are cre-
cution status computation. ROS2’s distinct feature is its
ated in the Master thread, and each real-time thread sets
decoupling of computation, which facilitates distributed
a Timer to periodically wake itself up. When the Timer
node computing.
overflows, an interrupt is generated, and the system
enters the interrupt handler. The ISR calls wake_up_pro-
4.1 Latency Evaluation Method
cess() to wake up the real-time process, and the scheduler
Cyclictest accurately and repeatedly measures the dif-
performs scheduling and dispatching. The total latency
ference between the expected and actual wakeup times
time includes the interrupt handling time and schedul-
of threads, providing statistical information on system
ing latency. At the beginning of each loop, the current
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 6 of 15

running on four CPUs to bring CPU usage to near 100%


(stress-ng -c 4 --cpu-method fft --timerfd-freq 1000000
-t 24h &), as shown in Figure 8. For the Native-Linux
system, the test took 242.198 s, with a maximum latency
of 6243 µs and an average latency of 3 µs, as shown in
Figure 9. This is inadequate for high-precision motion
equipment and robots, as the timing jitter for a control
cycle of 1 ms is usually required to be less than 200 µs.
Similarly, for the optimized Preempt_RT-Linux system,
five real-time threads were launched with frequencies
ranging from 1000 to 3000 Hz, and a maximum latency
of 82 µs and an average delay of 2 µs were observed dur-
Figure 7 Calculating periodic latency ing the 25.7 h test, as shown in Figure 10.
The comparison between Native-Linux and Preempt_
RT-Linux is shown in Table 4. The real-time performance
time is calculated, and the value is passed to the Master
of the optimized Preempt_RT-Linux has been signifi-
thread through shared memory for statistics and output.
cantly improved. Compared to Native-Linux, Preempt_
In the while loop, the interval is slept for a few microsec-
RT-Linux has smaller minimum and average latency
onds before waking up and obtaining the current time to
values, and notably, the maximum latency value has sig-
calculate the latency time repeatedly. The relevant code
nificantly decreased.
snippet is shown in Figure 7.
4.3 Real‑time Performance Evaluation of ROS1 and ROS2
4.2 Real‑time Performance of Native‑Linux Kernel
under Different Data Sizes
and Preempt_R‑Linux
The paper discusses the end-to-end latency between
The present study first evaluated the real-time perfor-
publishers and subscribers, with data sizes ranging
mance of the native Linux kernel and the kernel opti-
from 64 bytes to 16 megabytes, using string-type mes-
mized with the Preempt_RT patch. For ease of writing,
sages for evaluation. The study evaluates the latency
the native Linux kernel is abbreviated as "Native-Linux,"
characteristics of ROS1 and ROS2. Table 4 lists the
while the kernel optimized with the Preempt_RT patch is
hardware and software environment used to measure
abbreviated as "Preempt_RT-Linux." Loading tests were
the latency from the timing publish function of a sin-
performed in the experiment, with Fourier transforms
gle publishing node to the callback function of another

Figure 8 CPU load status


Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 7 of 15

Figure 9 Timing latency of the native Linux kernel system

Figure 10 Real-time performance of Preempt_RT-Linux

subscribing node on the same computer, as illustrated


in Figure 11. The nodes are executed at a frequency Table 4 Comparison of real-time performance between Native-
of 10 Hz, and data of different sizes are evaluated 120 Linux and PREEMPT-RT-Linux
times. Line graphs and the median latency for each Item Period (µs) Native-Linux (µs) PREEMPT-
group of data are obtained. RT-
ROS1 uses TCPROS for reliable communication, while Linux (µs)
the corresponding QoS reliable policy is used in ROS2 Minimum 1000 2 1
architecture. Fast DDS is used as the DDS middleware, 1500 2 1
which is released under the LGPL license. To accurately 2000 2 1
measure real-time performance, the node design follows 2500 2 1
the SCHED_FIFO scheduling policy and uses mlockall 3000 2 1
for memory locking. SCHED_FIFO processes have prior- Maximum 1000 3697 67
ity over CFS processes (which are usually used with no 1500 4973 64
specified real-time processes and use the default Linux 2000 6243 70
scheduling policy). The purpose of mlockall is to fix the 2500 3802 82
process’s virtual address space in physical RAM, pre- 3000 3542 64
venting memory from being paged to the swap area and Average 1000 3 1
reducing the latency caused by memory allocation. In 1500 3 2
ROS2, the QoS policy queue size for publishers and sub- 2000 3 2
scribers is 100, the history is "keep history", the reliability 2500 3 2
is "reliable", the persistence is "volatile", and the liveliness,
3000 3 1
deadline, lifespan, and lease duration are all set to "sys-
tem default".
Figure 12 illustrates the real-time performance of
ROS1 and ROS2 on Native-Linux and Preempt_RT-
Linux. The results indicate that Preempt_RT-Linux
optimization leads to better real-time performance
compared to Native-Linux. Additionally, the curves
show that as data size increases (e.g., data size exceed-
ing 512K bytes), the real-time performance of ROS2
outperforms ROS1, mainly because DDS is used as the
transmission method in ROS2. However, as data size Figure 11 Inter-process node message transmission and reception
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 8 of 15

Figure 12 Comparison of real-time performance of ROS1 and ROS2 before and after optimization

Figure 13 Real-time performance comparison of ROS1 and ROS2


for small data sizes

Figure 15 Comparison of the real-time performance of optimized


ROS2

conversion from DDS to ROS2. These conversions con-


sume time, and between them, ROS2 calls the DDS API
and sends the message to DDS.
When transmitting small-sized data (ranging from
64 bytes to 64K bytes) in the experiment, the real-time
performance of ROS1 and ROS2 was comparable before
optimization, and remained so after optimization. How-
ever, as shown in Figure 13, the real-time performance
of the ROS2 system optimized with Preempt_RT was
Figure 14 Comparison of ROS1 and ROS2 real-Time performance superior to that of the native ROS2 system. For small
bar chart
data transfers, the conversion and transmission time
between nodes and interfaces is relatively small, so the
latency remains essentially constant based on the curve
increases, the latency also increases due to the impact
observed.
of message conversion and DDS processing. DDS has
Furthermore, as shown in the bar graph in Figure 14,
a more significant impact on larger data size transmis-
it can be seen that Preempt_RT-Linux-ROS2 has better
sion. For ROS2 message transmission, two message
real-time performance than Preempt_RT-ROS1 in the
conversions are required between ROS2 and DDS, with
case of large data transmission.
the first conversion from ROS2 to DDS and the second
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 9 of 15

Figure 16 Real-time performance of Native-Linux-ROS2 Figure 18 Real-time performance of Preempt_RT-Linux-ROS2


with different packet sizes with different sizes

Examining small-sized data reveals that, for Native-


Linux-ROS2, a smaller data size does not necessarily
mean less latency, as shown in Figure 17, where a data
size of 64 bytes resulted in a latency of more than 800 µs.
Latency also depends on the real-time capabilities of the
operating system.
After optimization, Preempt_RT-Linux-ROS2 has
smaller real-time fluctuations, and the maximum latency
for different data sizes is much smaller than that of
Native-Linux-ROS2, as shown in Figures 18 and 19.

4.4 Real‑time Performance of Different DDS and QoS


ROS2 is built on top of DDS/RTPS middleware, provid-
ing discovery, serialization, and transmission. DDS, as
an end-to-end middleware, provides message-passing
Figure 17 Real-time performance of Native-Linux-ROS2 mechanisms and control over different "quality of ser-
on small-scale Data vice" (QoS) options. This section attempts to illustrate
the intuitive impact of different DDS and QoS on real-
time performance. The study compares eProsima’s Fast
Figure 15 provides clear evidence that Preempt_RT- DDS, Eclipse’s Cyclone DDS, and GurumNetworks’
Linux-ROS2 outperforms Preempt_RT-ROS1 in real- GurumDDS, as shown in Figure 20. The curves show
time performance, particularly when dealing with large that the latency of the different DDS is similar. Specific
data transfers. In fact, as data transfer size increases, RMW files and dependencies need to be installed for
the superiority of Preempt_RT-Linux-ROS2 over ROS1 use. Both C++ and Python nodes support the RMW_
becomes even more pronounced. IMPLEMENTATION environment variable to select
Further analysis of the latency of each execution shows the RMW implementation to be used when running
that Native-Linux-ROS2 has larger latency fluctuations ROS2 applications. This variable can be set to a spe-
for different data sizes, as shown in Figure 16. The larger cific implementation identifier, such as rmw_fastrtps_
the size of the message-passing data, the more pro- cpp, rmw_connextdds, or rmw_gurumdds_cpp. For
nounced the latency fluctuations. instance, RMW_IMPLEMENTATION=rmw_connext-
dds ros2 run demo_nodes_cpp talker.
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 10 of 15

Figure 20 Real-time performance of different DDS in ROS2

Figure 19 Real-time performance of Preempt_RT-ROS2


on small-scale data

ROS2 provides a rich variety of QoS policies for


adjusting communication between nodes. Using the
appropriate QoS set, ROS2 can achieve reliable com-
munication, similar to TCP, or best-effort transmission,
similar to UDP, and can realize various possible states.
Unlike ROS1, which mainly supports TCP communica-
tion, ROS2 benefits from the flexibility of the underlying
DDS transport. In lossy wireless network environments, Figure 21 Real-time performance of different QoS in ROS2
the best-effort policy is more suitable. In real-time com-
puting systems, the correct service configuration is
needed to meet the final deadline. A set of correct QoS
policy combinations form a QoS configuration file. QoS Table 5 Different QoS policies
configuration files can be specified for publishers, sub-
Item Reliable strategy Best-effort strategy
scribers, service servers, and clients. QoS configuration
files can be applied independently to each instance of History KEEP_ALL KEEP_LAST
the above entities, but if different configuration files are Depth 100 1
used, they may be incompatible, thus preventing message Reliability Best_effort Reliable
delivery. Different QoS policies affect the real-time per- Durability Transient local Volatile
formance of the system. We compared the reliable policy Deadline Default Default
with the best-effort policy using QoS settings. A reliable
policy helps ensure reliable communication transmis-
sion, while communication in the best-effort policy is
unreliable. In the best-effort policy, the subscriber node policy, while TCP is used in the reliable policy. The QoS
must be started before the publisher node begins sending history for reliable policy is KEEP_ALL with a depth of
messages to avoid "initial value loss." In the test, the sub- 100, and for the best-effort policy, the history is KEEP_
scriber node was started first, followed by the publisher LAST with a depth of 1. The specific policy settings are
node. shown in Table 5.
Figure 21 shows the latency under different QoS poli-
cies. It can be seen from the figure that for small data 4.5 Real‑time Performance of Different Transmission
sizes, the latency of the best-effort policy and the reliable Methods
policy is similar. When the data size increases, the latency In design, applications are often composed of individ-
of the best-effort policy is smaller than that of the reli- ual "nodes" that perform small tasks and are separated
able policy. This is because UDP is used in the best-effort from other parts of the system. Such design enables
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 11 of 15

Figure 22 Inter-process communication through shared memory

Figure 23 Real-time performance of different transport methods


Figure 25 Periodic jitter at different frequencies on Native-Linux
in ROS2

Figure 24 Periodic execution of tasks

fault isolation, faster development, program modu-


larity, and code reuse, but often at the cost of perfor-
mance. In design, it is also possible to implement
multiple nodes within a single process (intra-process),
with different nodes implementing message passing,
i.e., shared memory transfer, as shown in Figure 22. In
this case, DDS is not required. When using std::unique_
ptrs for publishing and subscribing, zero-copy message
transfer can be achieved through intra-process publish/
subscribe connections. DDS requires at least two mes- Figure 26 Different frequency period jitter in Preempt_RT-Linux
sage translations. The address can be printed to view it:
printf("Print out the address of the received message in
DDS: 0x%", reinterpret_caststd::uintptr_t(msg.get())).
drawn. The mean latency of each data size is statistically
The publishing node and subscribing node have the
calculated 120 times. For small data sizes, the latency of
same address, indicating that the received mail address
inter-process and shared memory transmission is simi-
is the same as the published mail address and not a
lar because the effect of shared memory is hidden by
copy. However, when using const& and std::shared_ptr
small data sizes. As the data size increases, a significant
for publishing and subscribing, multiple copies will be
difference in latency can be observed. Shared memory
created in this case.
provides an effective way to transmit large data sizes. It
To facilitate the analysis of end-to-end latency char-
also effectively avoids splitting a message into multiple
acteristics of inter-process transmission (Figure 11) and
data packets, reducing end-to-end latency.
shared memory transmission (Figure 22), Figure 23 is
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 12 of 15

to 5000 Hz, with corresponding curves plotted as shown


in Figure 26. The jitter fluctuations were smaller, and the
maximum cycle jitter was less than 60 µs. The real-time
system based on the Preempt_RT patch exhibits good
timing performance.
In Figure 27, a histogram of the corresponding cycle jit-
ter is shown, which demonstrates a roughly normal dis-
tribution, with the majority of the data points centered
around ±10 μs.
Figure 27 Histogram-based statistics of periodic jitter
4.7 Evaluation of Timing Jitter Performance for Multiple
Subscribing Nodes
4.6 Real‑time Characteristics at Different Frequencies In the previous section, we focused on end-to-end
The high and low control frequencies have a significant latency between two nodes, analyzed the real-time per-
impact on the effectiveness of control, including trajec- formance of ROS1 and ROS2, and investigated the
tory smoothness and computational refinement. This impact of different factors on ROS2’s real-time perfor-
section discusses the impact of frequency on real-time mance, such as DDS, QoS, frequency, and throughput.
performance. An experiment was conducted to test However, in practical applications, there may be a single
10000 cycles, with the horizontal axis representing the node publishing messages that are shared and received by
number of recordings and the vertical axis indicating multiple nodes. In this section, we conduct further real-
cycle jitter. The absolute positioning period, dk , was also time performance analysis by designing one publisher
measured: and six subscribers to measure the latency of each receiv-
ing node.
dk = tk − kT . (3) Figure 28 shows the latency of ROS1, and it can be
The cycle jitter, Pk is defined as the deviation between observed that there is a significant difference in the
time k+1 and time k minus the period T, as shown in latency between the subscribing nodes. Since ROS1
Figure 24. arranges message publishing and receiving in sequence,
it is not suitable for real-time systems. For instance, when
Pk = tk+1 − tk − T = dk+1 − dk . (4) the data size is 4Mb, the maximum latency of the sub-
scribing node is nearly twice the minimum latency. In
To better evaluate real-time performance, the CPU was
contrast, the latency of ROS2 is largely dependent on the
run at full load. Figure 25 shows the cycle jitter for dif-
packet size, and the latency deviation of all subscribers in
ferent frequencies on a Native-Linux system, with maxi-
ROS2 is small, as shown in Figures 29 and 30. It is evident
mum jitter greater than 400 µs and large fluctuations.
that the behavior of all subscribers is relatively fair in
The native system is not real-time and cannot be used for
ROS2. This demonstrates that ROS2 message publishing
multi-axis high-precision motion control.
is fairer for multiple subscribing nodes than ROS1. After
For the optimized Preempt_RT-Linux system, an analy-
real-time optimization, ROS2 shows improved real-time
sis was conducted on the cycle jitter for different timing
performance for multi-node subscriptions compared to
frequencies, which were increased sequentially from 25
before the optimization.

Figure 28 Real-time performance of multiple subscriber nodes in ROS1


Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 13 of 15

Figure 29 Real-time performance of multiple subscriber nodes in the native system ROS2

Figure 30 Real-time performance of the optimized multiple subscriber nodes in ROS2

Furthermore, an in-depth analysis of the latency under real-time constraints. It should be noted that
characteristics of data transmission at various frequen- the latency also depends on the size of the transmit-
cies optimized with Preempt_RT is conducted. Using ted data, which was fixed at 1K byte in this experiment.
the default Fast-DDS as the message passing middle- The maximum latency was less than 150 µs.
ware, we measured the data transmission latency of
sending and receiving messages at different frequencies 4.8 Real‑time Performance of EtherCAT Master
with a fixed message size of 1K byte. Each frequency The EtherCAT master needs to run on a real-time system
was tested 120 times, and the results were plotted in to ensure strict real-time performance. The robot system
a 3D graph in Figure 31 and a corresponding curve
in Figure 32. It can be observed from the figures that
the latency deviation is small at different frequencies

Figure 32 Real-time performance of Preempt_RT-Linux-ROS2


Figure 31 Latency distribution of ROS2 at different frequencies at different frequencies
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 14 of 15

Figure 33 Real-time performance of EtherCAT master

platform (see Figure 2) has one EtherCAT master and with the CPU running at full load. The system dem-
16 EtherCAT slaves. In the experiment, the master sta- onstrated stable real-time performance, running
tion cycle period was set to 1000 µs, and we obtained the for 25.7 h with a maximum latency of 82 µs and an
EtherCAT master’s latency data for a duration of 1613610 average latency of 2 µs.
ms, as shown in Figure 33. The minimum period of the (3) This study compares the real-time performance of
master station was 982.5 µs, the average period was 999.8 ROS1 and ROS2, both located in the application
µs, and the maximum period was 1022.4 µs. It can be layer above the Linux kernel. The real-time perfor-
seen that the EtherCAT master exhibits good real-time mance of the optimized Preempt_RT-Linux-ROS2
performance and can be applied to robot control. is much better than that of the Native-Linux-ROS2.
Additionally, for a single publisher node corre-
5 Conclusions sponding to multiple subscriber nodes, ROS2 dem-
This paper proposes an optimization and assessment of onstrates fairer real-time performance than ROS1
ROS2’s real-time performance, utilizing a method that for multiple subscribers, making ROS2 more suit-
melds fair and first-in-first-out scheduling strategies for able for the development of real-time control sys-
a robotic control system. This method, predicated on the tems.
ROS2’s DDS transmission mechanism, adopts the use (4) The optimized Preempt_RT-Linux maintains stable
of Preempt_RT to construct a fully preemptive, event- performance for both average jitter and maximum
driven system kernel, thereby improving the timeliness jitter at different frequencies, with a timing jitter
and reliability of ROS2’s data transmission. cycle of less than 60 µs. The study also measures
We engage in both qualitative and quantitative evalu- the real-time performance of the EtherCAT master,
ations of the real-time performance of ROS1 and ROS2, with a timing cycle of 1000 us and a worst-case tim-
considering factors such as throughput, transmission ing cycle of 1022.4 us, demonstrating the effective-
methodology, QoS service quality, frequency, quantity of ness of the optimized system.
subscription nodes, and EtherCAT master. Our findings
indicate reliable real-time performance of the optimized
Acknowledgements
ROS2 with Preempt_RT implementation. This research Not applicable.
intuitively demonstrates that, in large-scale data trans-
mission and multiple node subscriptions, ROS2 outper- Authors’ Contributions
YY designed and wrote the paper, XL and ZN completed manuscript revisions,
forms ROS1 in terms of real-time performance. FX provided suggestions and guidance, ZL assisted with programming, and PL
Specific conclusions of the paper are as follows. provided assistance in device construction. All authors read and approved the
final manuscript.
(1) The key to improving the real-time performance of Authors’ Information
ROS2 lies in optimizing the real-time performance Yanlei Ye born in 1991, is currently a Ph.D. candidate at Department of Mechani-
of the operating system. The use of the Preempt_ cal Engineering (DME), Tsinghua University, China. His research interests include
robot operating systems and compliant motion control.
RT patch can reduce the latency in ROS2 message Zhenguo Nie born in 1983, is currently an associate professor at DME, Tsinghua
transmission. Preempt_RT improves the real-time University, China. His research interests include intelligent design and surgical
computing capability of the native Linux kernel robotics.
Xinjun Liu born in 1971, is currently a professor and a Ph.D. candidate supervi-
through high-precision timers, thread interrupt sor at DME, Tsinghua University, China. His research interests include robotics,
handlers, sleep spinlocks, real-time mutexes, and parallel mechanisms, and advanced manufacturing equipment.
RCU synchronization mechanisms. Fugui Xie born in 1982, is currently an associate professor and a Ph.D.
candidate supervisor at DME, Tsinghua University, China. His research interests
(2) The real-time performance of the optimized ROS2 include parallel mechanisms and mobile machining robots.
system was systematically and comprehensively Zihao Li born in 1992, is currently a Ph.D. candidate at DME, Tsinghua University,
evaluated under stringent operating conditions China. His research interests include cooperative robot and teleoperation.
Ye et al. Chinese Journal of Mechanical Engineering (2023) 36:144 Page 15 of 15

Peng Li born in 1989, is currently a Ph.D. candidate at DME, Tsinghua University, [18] W Sim, B Song, J Shin, et al. Data distribution service converter based on
China. His research interests include collaborative robot design and control. the open platform communications unified architecture publish–sub-
scribe protocol. Electronics, 2021, 10(20): 2524.
Funding [19] H Choi, Y Xiang, H Kim. PiCAS: New design of priority-driven chain-aware
Supported by National Key Research and Development Program of China scheduling for ROS2. Real-Time and Embedded Technology and Applica-
(Grant No. 2019YFB1309900), and Institute for Guo Qiang, Tsinghua University tions Symposium, Nashville, TN, USA, May 18-21, 2021: 251-263.
of China (Grant No. 2019GQG0007). [20] T Kronauer, J Pohlmann, M Matthé, et al. Latency analysis of ROS2 multi-
node systems. Multisensor Fusion and Integration for Intelligent Systems,
Karlsruhe, Germany, September 23-25, 2021.
Declarations [21] L Ding, M C Qu, Y L Zhang, et al. Analysis and engineering application of
ROS2. Beijing: Tsinghua University Press, 2019. (in Chinese)
Competing Interests [22] H Choi. On the design and analysis of autonomous real-time systems. Uni-
The authors declare no competing financial interests. versity of California, Riverside, 2021.
[23] B Akesson, M Nasri, G Nelissen, et al. An empirical survey-based study
into industry practice in real-time systems. Real-Time Systems Symposium,
Received: 22 February 2023 Revised: 11 November 2023 Accepted: 12 Houston, TX, USA, December 01-04, 2020: 3-11.
November 2023 [24] H Kopetz, W Steiner. Real-time systems: design principles for distributed
embedded applications. Springer Nature, 2022.
[25] A Barbalace, A Luchetta, G Manduchi, et al. Performance comparison of
VxWorks, Linux, RTAI, and Xenomai in a hard real-time application. IEEE
Transactions On Nuclear Science, 2008, 55(1): 435-439.
References [26] D Dasari, M Becker, D Casini, et al. End-to-end analysis of event chains
[1] F Reghenzani, G Massari, W Fornaciari. The real-time linux kernel: A survey under the qnx adaptive partitioning scheduler. Real-Time and Embedded
on preempt_rt. ACM Computing Surveys (CSUR), 2019, 52(1): 1-36. Technology and Applications Symposium, Milano, Italy, May 04-06, 2022:
[2] S Macenski, T Foote, B Gerkey, et al. Robot Operating System 2: 214-227.
Design, architecture, and uses in the wild. Science Robotics, 2022, 7(66): [27] C Maiza, H Rihani, J M Rivas, et al. A survey of timing verification tech-
eabm6074. niques for multi-core real-time systems. ACM Computing Surveys (CSUR),
[3] S Barut, M Boneberger, P Mohammadi, et al. Benchmarking real-time 2019, 52(3): 1-38.
capabilities of ROS2 and OROCOS for robotics applications. IEEE Interna- [28] D Ramegowda, M Lin. Energy efficient mixed task handling on real-time
tional Conference on Robotics and Automation, Xi’an, China, May 30 -June embedded systems using FreeRTOS. Journal of Systems Architecture, 2022:
05, 2021: 708-714. 131.
[4] R Mittal, A Konno, S Komizunai. Implementation of hoap-2 humanoid [29] R Delgado, B You, B W Choi. Real-time control architecture based on
walking motion in openhrp simulation. International Conference on Xenomai using ROS packages for a service robot. Journal of Systems and
Computing Communication Control and Automation, Pune, India, February Software, 2019, 151: 8-19.
26-27, 2015: 29-34. [30] R Delgado, J Park, B W Choi. Open embedded real-time controllers for
[5] G Metta, P Fitzpatrick, L Natale. YARP: yet another robot platform. Interna- industrial distributed control systems. Electronics, 2019, 8(2): 223.
tional Journal of Advanced Robotic Systems, 2006, 3(1): 43-48. [31] J Arm, Z Bradac, V Kaczmarczyk. Real-time capabilities of Linux RTAI. Ifac-
[6] T Fietzek, H Ü Dinkelbach, F H Hamker. ANNarchy-iCub: An interface for Papersonline, 2016, 49(25): 401-406.
easy interaction between neural network models and the iCub Robot. [32] G K Adam, N Petrellis, L T Doulos. Performance assessment of Linux
Computational Intelligence and Virtual Environments for Measurement Kernels with PREEMPT_RT on ARM-Based embedded devices. Electronics,
Systems and Applications, Chemnitz, Germany, June 15-17, 2022. 2021, 10(11): 1331.
[7] J Jackson. Microsoft robotics studio: A technical introduction. IEEE Robot-
ics & Automation Magazine, 2007, 14(4): 82-87.
[8] P Marion, M Fallon, R Deits, et al. Director: A user interface designed for
robot operation with shared autonomy. Journal of Field Robotics, 2017,
34(2): 262-280.
[9] D Kortenkamp, R Simmons, D Brugali. Robotic systems architectures and
programming. Springer Handbook of Robotics, 2016: 283-306.
[10] M Quigley, K Conley, B Gerkey, et al. ROS: an open-source robot operating
system. ICRA Workshop on Open Source Software, Kobe, Japan, 2009.
[11] T Itsuka, M Song, A Kawamura, et al. Development of ROS2-TMS: new
software platform for informationally structured environment. ROBO-
MECH Journal, 2022, 9(1): 1-19.
[12] Y Maruyama, S Kato, T Azumi. Exploring the performance of ROS2.
Proceedings of the 13th International Conference on Embedded Software,
Pittsburgh, PA, USA, October 02-07, 2016.
[13] M Albonico, M Đorđević, E Hamer, et al. Software engineering research
on the Robot Operating System: A systematic mapping study. Journal of
Systems and Software, 2022.
[14] M Karamousadakis. Real-time programming of EtherCAT master in ROS for a
quadruped robot. National Technical University of Athens, 2019.
[15] H Wei, Z Shao, Z Huang, et al. RT-ROS: A real-time ROS architecture on
multi-core processors. Future Generation Computer Systems, 2016, 56:
171-178.
[16] K Belsare. Micro-ROS//A Koubaa. Robot Operating System (ROS). Cham:
Springer International Publishing, 2023: 3-55.
[17] A Hakiri, P Berthou, A Gokhale, et al. Publish/subscribe-enabled software
defined networking for efficient and scalable IoT communications. IEEE
Communications Magazine, 2015, 53(9): 48-54.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy