AI Routers & Network Mind: A Hybrid Machine Learning Paradigm For Packet Routing
AI Routers & Network Mind: A Hybrid Machine Learning Paradigm For Packet Routing
©ISTOCKPHOTO.COM/METAMORWORKS
R
1. Introduction tralized intelligent paradigm for AI-driven networking called
ecently, networks throughout the world are under- Knowledge-Defined Networking (KDN), in which control
going profound restructuring and transformation strategies are generated in a centralized knowledge plane
with the development of Software-Defined Net- enabled by ML algorithms. However, as the network scale
working (SDN), Network Function Virtualization expands, the centralized paradigm incurs excessive overhead in
(NFV), and 5th-generation wireless systems (5G). The new terms of both communication and computation, especially for
networking paradigms are eroding the dominance of traditional real-time network control tasks (such as traffic routing). This
ossified architectures and reducing dependence on proprietary overhead will certainly introduce large delays that will further
hardware. However, the corresponding improvements in net- degrade the performance of AI-based algorithms.
work flexibility and scalability are also presenting unprecedent- As discussed above, both the distributed and centralized
ed challenges for network management. In particular, with the paradigms are imperfect and have fundamental flaws. There-
emergence and development of new services and scenarios fore, in this paper, we propose a hybrid AI-driven paradigm
(such as the IoT paradigm and AR/VR), network scales and for traffic routing control in which we combine a distribut-
traffic volumes are exhibiting explosive growth, and the QoS/ ed intelligence, based on units called “AI routers,” with a
QoE requirements are becoming increasingly demanding. This centralized intelligence platform, called the “network mind,”
ever-increasing network complexity makes effective network to support different network services. Specifically, we sepa-
control extremely difficult. In particular, current control strate- rately consider centralized intelligent control for tunneling-
gies largely rely on manual processes, which have poor scalabil- based routing and distributed intelligence for hop-by-hop
ity and robustness for the control of complex systems. routing. In addition, we apply two kinds of ML algorithms
Therefore, there is an urgent need for more powerful methods to optimize traffic routing control strategies to satisfy net-
of addressing the challenges faced in networking. work service requirements, such as congestion control and
In recent years, with the great success of machine learning, QoS/QoE guarantees.
applications of Artificial Intelligence and Machine Learning The main contributions of this paper are briefly summa-
(AI&ML) in networking have received considerable attention rized below.
[1], [2]. Compared to meticulously manually designed (white- ❏❏ We propose a hybrid ML paradigm for packet routing, in
box) strategies, AI&ML (black-box) techniques offer enormous which we combine a distributed intelligence based on AI
advantages in networking systems. For example, AI&ML pro- routers with a centralized intelligence platform called the
vides a generalized model and uniform learning method with- network mind.
out prespecified processes for various network scenarios [3]. In ❏❏ For tunneling-based routing (with a high-QoS guarantee), we
addition, such techniques can effectively handle complex prob- discuss the feasibility and superiority of centralized optimiza-
lems and high-dimensional situations; indeed, AI&ML methods tion and deploy a deep-reinforcement-learning-based routing
have already achieved remarkable success in many complex sys- strategy in the network mind for route optimization.
tem control domains, including computer games and robotic ❏❏ For hop-by-hop routing, we shift the responsibility for
control [4]. In addition to the enormous advantages of AI&ML intelligent control to each AI router to ease the overhead
for networking, the development of new network techniques is imposed by centralized control and use the network mind
also providing fertile ground for AI&ML deployment. For to improve the global convergence.
example, In-band Network Telemetry (INT) enabled end-to- The rest of this paper is organized as follows. In Section 2,
end network visualization at the millisecond scale in 2015, and we review the related work on AI-driven network traffic rout-
Cisco published a big data analytics platform for networking, ing. In Section 3, we discuss the placement of the intelligent
PNDA, in 2017. Therefore, the growing trend of applying control plane and propose a hybrid architecture for various
AI&ML in networking is being driven by both task require- tasks. In Section 4, we propose a centralized AI-based routing
ments (the increasing complexity of networks and increasingly algorithm for high-QoS network services. In Section 5, we
demanding QoS/QoE requirements) and technological devel- design a hybrid routing architecture to address the distributed
opments (new network monitoring technologies and big data congestion control problem. In Section 6, several challenges
analysis techniques) [5]. and open issues are presented.
The AI&ML-driven networking paradigm was first put for-
ward by D. Clark et al. in [6], where “A Knowledge Plane for 2. Related Work
the Internet” for network operations using AI&ML was pro- Although AI-driven networking is currently a research area of
posed. However, learning based on distributed nodes with only considerable interest, the idea of applying ML in traffic routing
Load Balance
Intelligent
Delay Flow
zation among all routers is extremely difficult, especially with Data Mining
Plane
Throughput Action
increasing network size, speed, and load. With the development
Packet Loss Mice QoS
of SDN technology, centralized AI-driven routing strategies
Flow
have received considerable attention.
Monitor Data
2.2. Centralized Routing Packets Out
Forwarding
In [18], Stampa et al. proposed a deep RL (DRL) algorithm for Packets In Packet
Plane
Packet Forwarding
optimizing routing in a centralized knowledge plane. Benefit- Engine
Forwarding
ing from the global control perspective, the experimental Engine
results showed very promising performance. In [19], Lin et al.
applied the SARSA algorithm to achieve QoS-aware adaptive
routing in multilayer hierarchical software-defined networks. FIGURE 1 The closed-loop control paradigm.
Cooperation
AI Router
Traffic Aware
Awareness Plane
Hardware State
Distributed
Forwarding Intelligence
Forwarding Plane
Engine
Decision
cy evaluation (critic).
SDN Controller
During the learning process, the DDPG agent first selects
an action based on the current strategy:
Tunneling
User 1 a t = n (s t ; i n ) + N t .(1)
Then, the agent executes the action a t and observes the reward
User 2 rt and the new state s t + 1 of the underlying network. During
training, a replay memory R is used to eliminate the temporal
correlations between data. The transition data (s t, a t, rt, s t + 1) for
FIGURE 5 The centralized intelligent control scheme. the current step are stored in R, and then, a random minibatch
Furthermore, the actor policy is updated using the sampled 5. AI Routers & Network Mind
policy gradient with the aim of maximizing the discounted Although tunneling-based protocols have advantages in terms
cumulative reward, which can be described as follows: of traffic engineering and QoS guarantees, full-mesh tunneling
across the whole network will result in operational complexity
di J . 1
n
N
/ d a Q ^ s, a i
Q
h s = s i, a = n (s i ) d i n ^s
n
i
n
h s i. (4) and limited scalability. Therefore, as shown in Fig. 8, we pro-
i pose a hybrid AI-based hop-by-hop routing paradigm. In our
architecture, for easing the overhead imposed by centralized
Compared to traditional heuristic-based algorithms, DRL
possesses several advantages for networking control. First, due to
the strong generalization ability of neural networks, DRL can
generate knowledge directly from nonlinear, complex, high- 8
dimensional network systems without requiring assumptions and
7
Average Deliver Time (ms)
where the first layer had 50 hidden units and the second had 40
hidden units. In our experiment, we applied OMNet++ for 6
network traffic simulation and Keras and TensorFlow for DDPG
agent construction.
4
In our experiment, the network state was represented by
the transmission delay and node processing delay, each action
was represented by the set of nodes defining the forwarding 2
path from the source node to the destination node, and the
reward was represented by the total delay for forwarding from 0
the source to the destination. 2 4 6 8
The learning process of the DDPG agent is illustrated in Network Load
Fig. 6. With an increasing number of training steps, the DDPG
agent gradually converges to the optimal strategy. In addition, FIGURE 7 The average delivery time versus different network loads.
Network Mind
However, with the objective of achieving the maximum
cumulative reward, this local reward signal encourages only
Network State selfish behavior. Therefore, to facilitate cooperation among
QoS the AI routers, we implement a difference reward to modi-
Network Analytics Platform fy the reward signal by removing much of the noise intro-
Decision Full duced by other routers. The difference reward is defined
Support State as follows:
Act1
AI Router
D i (s, a) = G (s, a) - G (s, a -i). (6)
Obs1
Here, G (s, a) is the global reward, which reflects the global
Act2
Obs2 utility of the whole system based on the joint actions executed
AI Router by the multiple AI routers. The global reward is defined as the
sum of all the local rewards:
G (t ) = / L (o i, t ). (7)
FIGURE 8 The decentralized intelligent control scheme. oi ! O
for future networks, but many challenges still remain, and 3.5
much more work needs to be done. In this section, we will
3.0
discuss the major challenges and open issues regarding
AI&ML-driven networking. 2.5
2.0
6.1. New Hardware Architectures
1.5
Every innovation with regard to upper-level services is based
on significant advances in the performance of the underlying 0 10,000 20,000 30,000 40,000
hardware, such as the Central Processing Unit (CPU) for gen- Iterations
eral-purpose computations, the Digital Signal Processor (DSP)
for a communication system, and the Graphics Processing Unit FIGURE 9 The global utility of the whole system.