0% found this document useful (0 votes)
145 views10 pages

AI Routers & Network Mind: A Hybrid Machine Learning Paradigm For Packet Routing

Uploaded by

dhkhtn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views10 pages

AI Routers & Network Mind: A Hybrid Machine Learning Paradigm For Packet Routing

Uploaded by

dhkhtn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Michael Margaliot

AI Routers & Network Mind:


Tel Aviv University, Israel

A Hybrid Machine Learning


Paradigm for Packet Routing

©ISTOCKPHOTO.COM/METAMORWORKS

Abstract—With the increasing complexity of network topolo-


Haipeng Yao and Tianle Mai gies and architectures, adding intelligence to the network con-
State Key Laboratory of Networking and Switching Technology, trol plane through Artificial Intelligence and Machine Learning
Beijing University of Posts and Telecommunications, Beijing, (AI&ML) is becoming a trend in network development. For
CHINA large-scale geo-distributed systems, determining how to appro-
Chunxiao Jiang and Linling Kuang priately introduce intelligence in networking is the key to
Tsinghua Space Center, Tsinghua University, Beijing, CHINA high-efficiency operation. In this treatise, we explore two
deployment paradigms (centralized vs. distributed) for AI-based
Song Guo
networking. To achieve the best results, we propose a hybrid
Department of Computing, Hong Kong Polytechnic University,
ML paradigm that combines a distributed intelligence, based on
HONG KONG SAR
units called “AI routers,” with a centralized intelligence, called
the “network mind”, to support different network services. In
the proposed paradigm, we deploy centralized AI control for
connection-oriented tunneling-based routing protocols (such
Digital Object Identifier 10.1109/MCI.2019.2937609
Date of current version: 14 October 2019 Corresponding Author: Haipeng Yao (yaohaipeng@bupt.edu.cn)

1556-603X/19©2019IEEE NOVEMBER 2019 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 21


as multiprotocol label switching and segment routing) to guar- a partial perspective on the network is a complex task, especial-
antee a high QoS, whereas for hop-by-hop IP routing, we shift ly with the goal of global optimization. This fundamental
the intelligent control responsibility to each AI router to ease defect has resulted in the stagnation of knowledge plane devel-
the overhead imposed by centralized control and use the net- opment. In recent years, benefiting from developments in SDN
work mind to improve the global convergence. technology, a centralized intelligent network architecture has
become a feasible solution. In [7], Mestres et al. proposed a cen-

R
1. Introduction tralized intelligent paradigm for AI-driven networking called
ecently, networks throughout the world are under- Knowledge-Defined Networking (KDN), in which control
going profound restructuring and transformation strategies are generated in a centralized knowledge plane
with the development of Software-Defined Net- enabled by ML algorithms. However, as the network scale
working (SDN), Network Function Virtualization expands, the centralized paradigm incurs excessive overhead in
(NFV), and 5th-generation wireless systems (5G). The new terms of both communication and computation, especially for
networking paradigms are eroding the dominance of traditional real-time network control tasks (such as traffic routing). This
ossified architectures and reducing dependence on proprietary overhead will certainly introduce large delays that will further
hardware. However, the corresponding improvements in net- degrade the performance of AI-based algorithms.
work flexibility and scalability are also presenting unprecedent- As discussed above, both the distributed and centralized
ed challenges for network management. In particular, with the paradigms are imperfect and have fundamental flaws. There-
emergence and development of new services and scenarios fore, in this paper, we propose a hybrid AI-driven paradigm
(such as the IoT paradigm and AR/VR), network scales and for traffic routing control in which we combine a distribut-
traffic volumes are exhibiting explosive growth, and the QoS/ ed intelligence, based on units called “AI routers,” with a
QoE requirements are becoming increasingly demanding. This centralized intelligence platform, called the “network mind,”
ever-increasing network complexity makes effective network to support different network services. Specifically, we sepa-
control extremely difficult. In particular, current control strate- rately consider centralized intelligent control for tunneling-
gies largely rely on manual processes, which have poor scalabil- based routing and distributed intelligence for hop-by-hop
ity and robustness for the control of complex systems. routing. In addition, we apply two kinds of ML algorithms
Therefore, there is an urgent need for more powerful methods to optimize traffic routing control strategies to satisfy net-
of addressing the challenges faced in networking. work service requirements, such as congestion control and
In recent years, with the great success of machine learning, QoS/QoE guarantees.
applications of Artificial Intelligence and Machine Learning The main contributions of this paper are briefly summa-
(AI&ML) in networking have received considerable attention rized below.
[1], [2]. Compared to meticulously manually designed (white- ❏❏ We propose a hybrid ML paradigm for packet routing, in
box) strategies, AI&ML (black-box) techniques offer enormous which we combine a distributed intelligence based on AI
advantages in networking systems. For example, AI&ML pro- routers with a centralized intelligence platform called the
vides a generalized model and uniform learning method with- network mind.
out prespecified processes for various network scenarios [3]. In ❏❏ For tunneling-based routing (with a high-QoS guarantee), we
addition, such techniques can effectively handle complex prob- discuss the feasibility and superiority of centralized optimiza-
lems and high-dimensional situations; indeed, AI&ML methods tion and deploy a deep-reinforcement-learning-based routing
have already achieved remarkable success in many complex sys- strategy in the network mind for route optimization.
tem control domains, including computer games and robotic ❏❏ For hop-by-hop routing, we shift the responsibility for
control [4]. In addition to the enormous advantages of AI&ML intelligent control to each AI router to ease the overhead
for networking, the development of new network techniques is imposed by centralized control and use the network mind
also providing fertile ground for AI&ML deployment. For to improve the global convergence.
example, In-band Network Telemetry (INT) enabled end-to- The rest of this paper is organized as follows. In Section 2,
end network visualization at the millisecond scale in 2015, and we review the related work on AI-driven network traffic rout-
Cisco published a big data analytics platform for networking, ing. In Section 3, we discuss the placement of the intelligent
PNDA, in 2017. Therefore, the growing trend of applying control plane and propose a hybrid architecture for various
AI&ML in networking is being driven by both task require- tasks. In Section 4, we propose a centralized AI-based routing
ments (the increasing complexity of networks and increasingly algorithm for high-QoS network services. In Section 5, we
demanding QoS/QoE requirements) and technological devel- design a hybrid routing architecture to address the distributed
opments (new network monitoring technologies and big data congestion control problem. In Section 6, several challenges
analysis techniques) [5]. and open issues are presented.
The AI&ML-driven networking paradigm was first put for-
ward by D. Clark et al. in [6], where “A Knowledge Plane for 2. Related Work
the Internet” for network operations using AI&ML was pro- Although AI-driven networking is currently a research area of
posed. However, learning based on distributed nodes with only considerable interest, the idea of applying ML in traffic routing

22 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | NOVEMBER 2019


can be traced back to the 1990s. In this section,
we review the related work on AI-driven net- In recent years, with the great success of machine
work routing algorithms. learning, applications of Artificial Intelligence and
2.1. Decentralized Routing
Machine Learning (AI&ML) in networking have
received considerable attention.
2.1.1. Single-Agent
Reinforcement Learning
In [8], Boyan et al. proposed the Q-routing algorithm for For each flow, the controller updated the optimal routing strat-
optimizing packet routing control. In the Q-routing algo- egy based on the QoS requirements and issued the forwarding
rithm, each router updates its policy according to its Q-func- table to each node along the forwarding path. In [20], Wang et
tion based on local information and communication. The al. proposed a RL-based routing algorithm for Wireless Sensor
experiments showed that Q-routing offered more efficient Networks (WSNs) named AdaR. In AdaR, Least-Squares Poli-
performance than the nonadaptive shortest path algorithm, cy Iteration (LSPI) is implemented to achieve the correct trad-
especially under a high workload. In [9], Choi et al. proposed eoff among multiple optimization goals, such as the routing
a memory-based Q-learning algorithm called predictive path length, load balance, and retransmission rate. However, the
Q-routing to increase the learning rate and convergence overhead incurred for centralized AI control is high.
speed by retaining past experiences. In addition, in [10],
Kumar et al. proposed dual reinforcement Q-routing (DRQ- 3. AI-Driven Network Routing
routing), which uses information gained through backward In this section, we first propose a three-layer logical functionality
and forward exploration to accelerate the convergence speed. architecture for AI-driven networking.Then, we discuss the prob-
In [11], [12], Reinforcement Learning (RL) was successfully lem of how far away the intelligent control plane can be located
applied in wireless sensor network routing, where the sensors from the forwarding plane (“centralized” or “distributed”).
and sink nodes could self-adapt to the network environment.
However, in a multiagent system, single-agent RL suffers 3.1. Closed-Loop Control Paradigm
from severe non-convergence. Instead, applying multiagent In a traditional network, the network layer functionality can
RL to improve the cooperation among network nodes is be divided into the forwarding plane and the control plane.
more feasible, and there have been a series of works on ML- However, with the introduction of AI&ML, this two-layer
driven routing based on multiagent RL. architecture cannot effectively describe the logic of intelli-
gent system operation. In this paper, inspired by the closed-
2.1.2. Multiagent Reinforcement Learning loop mechanism of the learning process of the human brain
In [13], [14], Stone et al. proposed the Team-Partitioned (“observation - judgment - action - learning”), we split the
Opaque-Transition RL (TPOT-RL) routing algorithm, which functionality of AI-based networking into three layers to
allows a team of network nodes working together toward a
global goal to learn how to perform a collaborative task. In
[15], Wolpert et al. designed a sparse reinforcement learning
algorithm named the Collective Intelligence (COIN) algo-
Control Plane

Load Balance
Intelligent

rithm, in which a global function is applied to modify the Queue


Policy
behavior of each network agent. In contrast, the author of [16] Management
Segment
proposed a Collaborative RL (CRL)-based routing algorithm Routing
with no single global state. The CRL approach was also suc-
Observation
cessfully applied for delay-tolerant network routing in [17].
However, in an inherently distributed system, state synchroni- Elephant
Awareness

Delay Flow
zation among all routers is extremely difficult, especially with Data Mining
Plane

Throughput Action
increasing network size, speed, and load. With the development
Packet Loss Mice QoS
of SDN technology, centralized AI-driven routing strategies
Flow
have received considerable attention.
Monitor Data
2.2. Centralized Routing Packets Out
Forwarding

In [18], Stampa et al. proposed a deep RL (DRL) algorithm for Packets In Packet
Plane

Packet Forwarding
optimizing routing in a centralized knowledge plane. Benefit- Engine
Forwarding
ing from the global control perspective, the experimental Engine
results showed very promising performance. In [19], Lin et al.
applied the SARSA algorithm to achieve QoS-aware adaptive
routing in multilayer hierarchical software-defined networks. FIGURE 1 The closed-loop control paradigm.

NOVEMBER 2019 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 23


network hardware and acts as a centralized plane
In an inherently distributed system, state in which an AI agent can interact with the whole
synchronization among all routers is extremely network to generate an optimal strategy. Howev-
er, while the advantages of centralized optimiza-
difficult, especially with increasing network size, tion are clear, the overhead of closed-loop
speed, and load. With the development of SDN control implemented through a centralized AI is
technology, centralized AI-driven routing strategies high. This overhead includes not only the com-
munication overhead for receiving and sending a
have received considerable attention. large amount of data but also the computational
overhead on the AI agent side for training and
construct a closed-loop network control paradigm. As illus- execution. In the centralized paradigm, all routers need to be
trated in Fig. 1, our paradigm consists of three layers, called programmed to build a single flow-forwarding path. In addi-
the forwarding plane, the awareness plane, and the intelligent tion, every time the network status changes, the controller
control plane. needs to recompute the forwarding logic. For large-scale net-
The forwarding plane is responsible for forwarding data pack- works with ultrahigh dynamics (on the millisecond scale),
ets from one interface to another in distributed network equip- this excessive communication pressure and high computa-
ment. Its operation logic relies completely on the forwarding tional burden are unacceptable.
table and configuration instructions issued by the control plane. As discussed above, the centralized and distributed para-
The purpose of the awareness plane is to monitor the net- digms, as the two ends of a spectrum, are both imperfect and
work status and upload the results to the control plane. Net- have corresponding advantages and disadvantages. A distributed
work monitoring and awareness are prerequisites for ML-based architecture carries the risk of non-convergence of the learning
control and optimization. Therefore, we abstract this new layer process, but it offers faster forwarding and processing speeds for
called the awareness plane for the collection and processing each packet. In contrast, completely centralized learning is
of monitoring data (for tasks such as network device monitor- advantageous for global optimization but may incur excessive
ing and network traffic identification) to provide network sta- overheads in terms of communication and computation.
tus information. Therefore, from our perspective, the centralized and distributed
The intelligent control plane is responsible for feeding approaches should be treated as complementary rather than
control decisions to the forwarding plane. The AI&ML-based mutually exclusive. In this paper, as shown in Fig. 2, we pro-
algorithms are deployed in this plane to transform the current pose a hybrid AI-driven control architecture that combines a
and historical operation data into control policies. “network mind” (centralized intelligence) with “AI routers”
These three abstract planes together constitute a closed-loop (distributed intelligence) to support different network services.
framework for AI&ML deployment in networking. In analogy Before we detail the operations in our AI-based routing
to the human learning process, the forwarding plane acts as the paradigm, let us start by reviewing the current state of devel-
“subject of action,” the awareness plane acts as the “subject of opment of routing protocols. Early on, the IP protocol won
observation”, and the intelligent control plane acts as the “sub- the battle between connectionless and connection-oriented
ject of learning/judgment.” Based on these three planes for routing and between source routing and distributed routing.
closed-loop control, an AI&ML agent can continuously learn As shown in Fig. 3, in the IP protocol, each router estab-
and optimize network control and management strategies by lishes a routing table based on its local information and com-
interacting with the underlying network. munications. This routing table contains the next-hop node
and a cost metric for each destination. Based on this hop-by-
3.2. Centralized vs. Distributed hop forwarding paradigm, a data packet needs to carry only its
As described above, a three-tier logical architecture is proposed. destination address in its header, which is beneficial for net-
However, when this abstract logical concept is deployed in a work scalability and robustness. However, because of the con-
real-world network, the placement of the intelligent control nectionless and distributed characteristics of the IP protocol,
plane (centralized or distributed) is critical to the efficient opera- traditional IP routing provides poor support for traffic engi-
tion of AI-driven networking. neering and QoS guarantees. To support high-QoS (high-
How far away the control plane can be located from the bandwidth, delay-sensitive) services, connection-oriented and
data plane has long been a controversial topic. In traditional source routing mechanisms have begun to receive attention
distributed networking equipment, the control plane and the once again. For example, as shown in Fig. 4, Multiprotocol
forwarding plane are closely coupled. Each node has only a Label Switching (MPLS) uses connection-oriented label
partial view of, and partial control over, the complete network. switching and explicit paths (source routing) to establish tem-
When AI&ML-based algorithms are applied in such a network, porary network tunnels between senders and receivers. This
the learning process will suffer from severe non-convergence, predetermined temporary tunnel routing strategy provides an
particularly when a global optimum is sought. In contrast, in an easier and more efficient QoS-guarantee mechanism for ser-
SDN architecture, the control plane is decoupled from the vice providers. However, full-mesh network tunneling is

24 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | NOVEMBER 2019


Network Mind
Centralized
Intelligence
Network State Policy
QoS Require

Network Analytics Platform

Cooperation

AI Router

Intelligent Control Local


Plane Decision

Traffic Aware
Awareness Plane
Hardware State

Distributed
Forwarding Intelligence
Forwarding Plane
Engine

FIGURE 2 The proposed hybrid architecture.

extremely operationally complex and offers limited scalability,


usually without any gain.
Thus, connection-oriented tunneling-based protocols for
reliable service delivery and connectionless distributed proto- Routing Table
cols coexist in current networks. For these two kinds of routing Routing Table
S1 > R1
mechanisms, intelligent control should be deployed in different
S3 > D1
ways. A tunneling-based protocol is essentially a centralized D1
routing protocol, in which the source node maintains an Hop-by-Hop R2 Hop-by-Hop
understanding of the state of the whole network and calculates S1 Computing Computing
an appropriate forwarding path. Therefore, in our paradigm, we
place the responsibility of intelligent control for routing opti-
mization with the network mind. In contrast, for hop-by-hop R1 R3
routing, we shift the responsibility for intelligent control to the
AI routers and use the network mind to facilitate cooperation FIGURE 3 Hop-by-hop protocol.
among multiple AI routers. We will discuss these issues in detail
in the following sections. MPLS-TE will result in a lack of coordination and competition for
network resources, which in turn will lead to a lack of optimali-
4. Network Mind ty, a lack of predictability and slow convergence. Therefore, in
High-QoS delivery is crucial to the success of current business our paradigm, to guarantee a high QoS for services that require
models for many network applications, such as online gaming it, the AI-based intelligent control of tunneling-based routing is
(which is sensitive to delays) and AR/VR (which requires high deployed in a centralized way.
bandwidths). Methods of guaranteeing QoS over tunneling- As shown in Fig. 5, the proposed network mind is responsi-
based protocols (such as MPLS and segment routing) have been ble for centralized intelligent traffic control and optimization.
discussed and developed for more than a decade. However, tra- The network mind accesses the fine-grained network state
ditional distributed signaling solutions based on RSVP-TE/ through an upload link and issues actions via a download link.

NOVEMBER 2019 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 25


the network scale and granularity of monitoring
Thus, connection-oriented tunneling-based protocols increase, a dimensional explosion of the network
for reliable service delivery and connectionless state will occur. Recently, the success achieved in
applying RL to many challenging decision-mak-
distributed protocols coexist in current networks. ing domains, such as GO and video games, sug-
For these two kinds of routing mechanisms, intelligent gests that this idea may not be impossible to
control should be deployed in different ways. realize [21]. RL provides a paradigm of learning
through trial and error to generate an optimal
behavior policy. In particular, benefiting from the
The upload link relies on a network monitoring protocol, such representation learning abilities of deep learning, DRL can be
as INT, Kafka, or IPFIX, to gather device states, traffic char- applied to directly construct and learn knowledge from raw
acteristics, configuration data, and service-level information; the high-dimensional data [22]. Therefore, in this paper, we apply
download link relies on a standard southbound interface, such as DRL for effective routing policy generation.
OpenFlow or P4, to facilitate efficient control over the network.
The upload and download links constitute an interaction 4.1. Modeling and Formulation
framework that provides the network mind with a global per- In the context of RL, the Markov Decision Process (MDP) is
spective and global control capabilities, and the current and his- a useful mathematical framework for tackling related problems.
torical data provided by the closed-loop operations are fed to The MDP is an abstract framing of the problem of learning
AI&ML algorithms for generating and learning knowledge. via interaction to achieve a certain control and optimization
However, learning control policies from complex and high- goal. In our scenario, the network mind and the underlying
dimensional system states is a challenging task. In particular, as network environment construct an MDP environment and
continually interact to generate control strategies. In each step,
the centralized AI agent observes the network state s t from
the underlying network and makes a routing decision in
accordance with the current strategy r (a ; s). Following this
decision, the controller issues the corresponding policy to the
Routing Table network nodes along the forwarding path. Then, the network
S1 > R1 > R3 > D1 transitions into the next state s t + 1, and the AI agent obtains an
immediate reward R from the environment. Specifically, the
D1
Tunneling
network state can be represented by network device informa-
Tunneling R2 tion and traffic characteristic information, and the actions can
S1
be represented by the forwarding path. The reward function
evaluates the effectiveness of the actions taken with respect to
R1 R3 the optimization target (such as a delay requirement or
throughput guarantee).
FIGURE 4 Tunneling-based protocol. In this paper, we apply the Deep Deterministic Policy Gra-
dient (DDPG) approach for policy generation [23]. A DDPG
agent consists of two components: the deterministic policy
network (actor) n (s ; i n) and the Q-network (critic) Q (s, a ; i Q ).
Network Mind The actor attempts to improve the current policy n (s ; i n)
based on the policy gradient, and the critic evaluates the
Policy
quality of the current policy with the parameters i n. The
State Actions
DDPG agent implements an iterative policy mechanism
Network Analytics Platform
that alternates between policy improvement (actor) and poli-
Observation

Decision

cy evaluation (critic).
SDN Controller
During the learning process, the DDPG agent first selects
an action based on the current strategy:
Tunneling
User 1 a t = n (s t ; i n ) + N t .(1)

Then, the agent executes the action a t and observes the reward
User 2 rt and the new state s t + 1 of the underlying network. During
training, a replay memory R is used to eliminate the temporal
correlations between data. The transition data (s t, a t, rt, s t + 1) for
FIGURE 5 The centralized intelligent control scheme. the current step are stored in R, and then, a random minibatch

26 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | NOVEMBER 2019


of N transitions (s i, a i, ri, s i + 1) is sampled from the replay in our experiment, we compared our algorithm with the
­memory to update the critic network by minimizing the fol- shortest path routing algorithm. As shown in Fig. 7, when the
lowing loss based on the ADAM optimizer [24]: traffic load is low, congestion does not occur in the network.
Therefore, the shortest path routing performs as well as the AI-
L= 1
N
/ ^ y i - Q ^ s i, a i i
Q
hh , (2)
2 based algorithm. However, with increasing load intensity, the
network congestion becomes severe on the shortest forwarding
i
path, and the AI-based routing achieves better performance
where we set y i to than the shortest path routing. Therefore, we can conclude that
the AI-based routing is effective, especially in the presence of
y i = ri + cQ (s i + 1, n (s i + 1 ; i n ) ; i Q ).(3) network congestion.

Furthermore, the actor policy is updated using the sampled 5. AI Routers & Network Mind
policy gradient with the aim of maximizing the discounted Although tunneling-based protocols have advantages in terms
cumulative reward, which can be described as follows: of traffic engineering and QoS guarantees, full-mesh tunneling
across the whole network will result in operational complexity
di J . 1
n

N
/ d a Q ^ s, a i
Q
h s = s i, a = n (s i ) d i n ^s
n
i
n
h s i. (4) and limited scalability. Therefore, as shown in Fig. 8, we pro-
i pose a hybrid AI-based hop-by-hop routing paradigm. In our
architecture, for easing the overhead imposed by centralized
Compared to traditional heuristic-based algorithms, DRL
possesses several advantages for networking control. First, due to
the strong generalization ability of neural networks, DRL can
generate knowledge directly from nonlinear, complex, high- 8
dimensional network systems without requiring assumptions and
7
Average Deliver Time (ms)

simplifications. Second, as a black-box optimization approach,


DRL allows reward functions to be redesigned to adapt to differ- 6
ent network targets without modifying the algorithm model.
Third, a DRL agent, once trained, can calculate a near-optimal 5
forward path in one step. In contrast, heuristic-based algorithms
4
require a large number of steps to converge to a new optimal
solution whenever the network state changes. Especially in 3
large-scale, highly dynamic networks, the resulting computa-
tional complexity will lead to serious non-convergence. 2

4.2. Simulation 2K 5K 10K 20K 50K 100K 150K


We present simulation results to demonstrate the feasibility and Training Steps
correctness of our algorithm. In our experiment, we simulated a
network with 12 nodes and 20 full-duplex links. To evaluate the FIGURE 6 The learning process of the DDPG agent.
algorithm performance under various network congestion con-
ditions, we set 10 different levels of traffic load intensity. The
traffic was generated subject to a Poisson distribution, and we
set different traffic load intensities by means of the parameter m. AI-Based
We used a neural network with two connected hidden layers, 8 Shortest Path Routing
Average Deliver Time (ms)

where the first layer had 50 hidden units and the second had 40
hidden units. In our experiment, we applied OMNet++ for 6
network traffic simulation and Keras and TensorFlow for DDPG
agent construction.
4
In our experiment, the network state was represented by
the transmission delay and node processing delay, each action
was represented by the set of nodes defining the forwarding 2
path from the source node to the destination node, and the
reward was represented by the total delay for forwarding from 0
the source to the destination. 2 4 6 8
The learning process of the DDPG agent is illustrated in Network Load
Fig. 6. With an increasing number of training steps, the DDPG
agent gradually converges to the optimal strategy. In addition, FIGURE 7 The average delivery time versus different network loads.

NOVEMBER 2019 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 27


centralized architecture improves the efficiency of
In our architecture, for easing the overhead knowledge sharing compared to that in a peer-
imposed by centralized control, we shift the to-peer architecture. In addition, the question of
which information should be transferred is
responsibility for intelligent control to the AI another important factor to consider; the more
routers and use the network mind to improve information is transferred, the faster the conver-
the global convergence. gence speed will be, but more communication
overhead will also be incurred. In this paper, we
use a “difference reward” as a modified reward
control, we shift the responsibility for intelligent control to the signal to improve the collective behavior of the AI routers, as
AI routers and use the network mind to improve the global we will describe in detail below.
convergence. In this section, we will detail the operations of
this architecture. 5.1. Modeling and Formulation
With the intelligent control responsibility shifted to each Coordination among multiple agents can be formulated as a
router, each router acts as an independent intelligent agent Decentralized Partially Observable Markov Decision Process
and the distributed AI agents constitute a Multi-Agent Sys- (Dec-POMDP). This Dec-POMDP can be described as a
tem (MAS). Each AI agent attempts to optimize its local poli- 5-tuple 1 I, S, A, O, R 2, where I is the set of agents, S is the
cy by interacting with its uncertain environment with the set of states, A is the set of actions, O is the set of local observa-
aim of maximizing the expected cumulative reward. Com- tions of each agent, and R is the set of rewards. In our scenario,
pared to a single-agent system, in which the state transitions in each step, each AI router takes an action a t in accordance
of the environment depend solely on the actions of the single with its local observation o t and current policy r (a t ; o t). Then,
agent, the state transitions of an MAS are subject to the joint an immediate local reward L is obtained from the network
actions of all agents. In other words, although each AI router environment, and the network state s t transitions to the new
makes decisions based on its own local network information, state s t + 1.
these individual decisions affect each agent’s transitions and In the MAS, the AI routers need to both cooperate with
the global reward. each other and compete with each other for the limited net-
To improve the global utility of this MAS, the ability to work resources. In this paper, we define that n resources (such
share experiences among the AI routers is significant. However, as bandwidth, cache, and computation power) exist in each
the question of how to achieve such information sharing in router. For router i, its observation o i can be represented by
this geo-distributed system is a key problem for high-efficiency o i = (1 ~ 1, Cap 1, Csu 1, t 2, f, 1 ~ k, Cap k, Csu k, t 2, f), where
operation. In our architecture, the centralized network mind is ~ k is the weight of resource k, Cap k is the capacity of that
introduced to serves as a point of global knowledge conver- resource, and Csu k, t is the amount of that resource consumed
gence for experience sharing. The centralized network mind [25]. The action a i is represented by the next-hop router, and
can access global network information via the network moni- the immediate local reward L is described as follows:
toring system and share knowledge via the download link. This
-Cap k
L (o i , t ) = f (o i ) = / ~ k e Csu k, t .(5)
k!n

Network Mind
However, with the objective of achieving the maximum
cumulative reward, this local reward signal encourages only
Network State selfish behavior. Therefore, to facilitate cooperation among
QoS the AI routers, we implement a difference reward to modi-
Network Analytics Platform fy the reward signal by removing much of the noise intro-
Decision Full duced by other routers. The difference reward is defined
Support State as follows:
Act1
AI Router
D i (s, a) = G (s, a) - G (s, a -i). (6)
Obs1
Here, G (s, a) is the global reward, which reflects the global
Act2
Obs2 utility of the whole system based on the joint actions executed
AI Router by the multiple AI routers. The global reward is defined as the
sum of all the local rewards:

G (t ) = / L (o i, t ). (7)
FIGURE 8 The decentralized intelligent control scheme. oi ! O

28 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | NOVEMBER 2019


Based on this, the update of the Q-value in AI
router i can be rewritten as follows: Based on the difference reward single, the
centralized network mind can continuously revise
Q i (o t, a t) ! D i (s, a) + mQ i (o t + 1, a t + 1).(8)
the strategy of each router. The underlying
Based on the difference reward single, the distributed intelligence can be trusted to adapt
centralized network mind can continuously correctly to changes in the network state, thereby
revise the strategy of each router. The underlying
distributed intelligence can be trusted to adapt
reducing the need for reaction, recomputation and
correctly to changes in the network state, there- updating of the centralized AI platform.
by reducing the need for reaction, recomputa-
tion and updating of the centralized AI platform.
(GPU) for image processing. Similarly, to meet the require-
5.2. Simulation ments of the AI-driven networking age, there is an urgent need
In this section, we present simulation results to demonstrate the for a specific AI networking processor [27].
feasibility and performance of our architecture and algorithm. Current networks generate millions of different types of
In our experiment, we focused on the congestion control flows every millisecond. Running AI algorithms on such
problem for the distributed routing paradigm, which is difficult massive volumes of data is extremely challenging. The com-
for traditional routing algorithms to address. puting power of current routers is far from being able to sat-
Our experimental environment was developed based on isfy the requirements for AI&ML deployment. Recently, as
[26]; we simulated a network with 4 nodes and 6 unidirec- highly parallel, multicore, multithreaded processors, GPU and
tional links and generated 400 data packets to be routed Tensor Processing Unit (TPU) chips have become a corner-
through the network. For simplicity, all these packets started stone of the AI age. Some studies have already shown that a
at the same source node and were sent to the same destina- GPU can offer improved packet processing capabilities [28].
tion node. Each packet was routed in a distributed manner by However, due to the need for high-speed processing of mas-
the AI routers. sive amounts of data (more than 10 Gb/s) and the stringent
As shown in Fig. 9, we compared our algorithm with a response delay requirements (less than 1 ms) for future net-
deterministic routing strategy and a single-agent RL algo- works, there is still a large gap between universal AI process-
rithm. For the deterministic routing strategy, all data packets ing chips and their actual deployment prospects in the
were routed along the same path. This strategy cannot respond networking field.
to the network state in a timely manner; thus, it will lead to
serious congestion problems and achieve an extremely low 6.2. Advanced Software Systems
global utility. In contrast, RL can dynamically adapt to the Currently, the handling of network data is posing challenges
congestion state of the network. However, due to the nonsta- typical of big data; recent years have seen a 3-fold increase in
tionary environment of the MAS, the learning process for sin- total IP traffic and a >60% increase in the number of devices
gle-agent RL suffers from severe non-convergence, also deployed and the amount of telemetry data streamed in near
resulting in a relatively low global score, as shown in Fig. 9. By
contrast, in our architecture, a difference reward is introduced
to modify the reward signal to enhance the collective behavior
of the AI routers, thereby improving the global utility of the 5.5
Deterministic
whole system. 5.0 Different
Local
4.5
6. Challenges and Open Issues
AI&ML-driven networking control is a promising paradigm 4.0
Global Utility

for future networks, but many challenges still remain, and 3.5
much more work needs to be done. In this section, we will
3.0
discuss the major challenges and open issues regarding
AI&ML-driven networking. 2.5

2.0
6.1. New Hardware Architectures
1.5
Every innovation with regard to upper-level services is based
on significant advances in the performance of the underlying 0 10,000 20,000 30,000 40,000
hardware, such as the Central Processing Unit (CPU) for gen- Iterations
eral-purpose computations, the Digital Signal Processor (DSP)
for a communication system, and the Graphics Processing Unit FIGURE 9 The global utility of the whole system.

NOVEMBER 2019 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 29


real time. Meanwhile, the geo-distributed nature of networking In addition, we apply two kinds of RL algorithms to opti-
is further increasing the difficulty of the widespread deploy- mize the routing strategies.
ment of platforms for network data analytics. For example,
challenges arise in determining how to aggregate data such as References
[1] C. Jiang, H. Zhang, Y. Ren, Z. Han, K.-C. Chen, and L. Hanzo, “Machine learning
log data, metric data, and network telemetry data; how to scale paradigms for next-generation wireless networks,” IEEE Wireless Commun., vol. 24, no.
up to the consumption of millions of flows per millisecond; 2, pp. 98–105, Apr. 2017.
[2] H. Yao, T. Mai, X. Xu, P. Zhang, M. Li, and Y. Liu, “NetworkAI: An intelligent
and how to efficiently share knowledge among distributed net- network architecture for self-learning control strategies in software defined networks,”
work nodes. The current end-to-end solutions, which combine IEEE Internet Things J., vol. 5, no. 6, pp. 4319–4327, Dec. 2018.
[3] M. Wang, Y. Cui, X. Wang, S. Xiao, and J. Jiang, “Machine learning for networking:
multiple technologies such as Apache Spark and Hadoop Workf low, advances and opportunities,” IEEE Netw., vol. 32, no. 2, pp. 92–99, Apr. 2018.
MapReduce, are extremely complex and time-consuming. [4] H. Yang, J. Wen, X.-J. Wu, L. He, and S. G. Mumtaz, “An efficient edge artificial
intelligence multi-pedestrian tracking method with rank constraint,” IEEE Trans. Ind.
Therefore, a powerful, scalable, big data analytics platform for Informat., Feb. 2019. doi: 10.1109/TII.2019.2897128.
networks and network services is needed. [27] [5] S. Goudarzi, N. Kama, M. H. Anisi, S. Zeadally, and S. Mumtaz, “Data collection
using unmanned aerial vehicles for internet of things platforms,” Comput. Electr. Eng., vol.
In addition, software libraries for ML networking tasks are 75, pp. 1–15, May 2019.
another important enabler for AI-based networking. ML [6] D. D. Clark, C. Partridge, J. C. Ramming, and J. T. Wroclawski, “A knowledge plane
for the internet,” in Proc. Conf. Applications, Technologies, Architectures, and Protocols for Com-
frameworks offer high-level programming interfaces for puter Communications, Karlsruhe, Aug. 25–29, 2003, pp. 3–10.
designing, training and validating ML algorithms. However, [7] A. Mestres et al., “Knowledge-defined networking,” ACM SIGCOMM Comput. Com-
mun. Rev., vol. 47, no. 3, pp. 2–10, July 2017.
current ML frameworks, such as TensorFlow, Caffe, and The- [8] J. A. Boyan and M. L. Littman, “Packet routing in dynamically changing networks: A
ano, are designed for general-purpose tasks and impose too reinforcement learning approach,” in Proc. Int. Conf. Neural Information Processing Systems,
Denver, 1993, pp. 671–678.
heavy a burden for the networking domain. They need to be [9] S. P. Choi and D.-Y. Yeung, “Predictive Q-routing: A memory-based reinforcement
further optimized to satisfy the requirements for networking learning approach to adaptive traffic control,” in Advances in Neural Information Processing
Systems, Denver, Dec. 2–5, 1996, pp. 945–951.
applications, such as high processing speed, low complexity, [10] S. Kumar and R. Miikkulainen, “Dual reinforcement Q-routing: An on-line adap-
and light weight. tive routing algorithm,” in Proc. Artificial Neural Networks in Engineering Conf., 1997, pp.
231–238.
[11] A. A. Bhorkar, M. Naghshvar, T. Javidi, and B. D. Rao, “Adaptive opportunistic
6.3. Promoting ML Algorithms routing for wireless ad hoc networks,” IEEE Trans. Netw., vol. 20, no. 1, pp. 243–256,
Feb. 2012.
While myriad ML algorithms have been developed, current [12] R. Arroyo-Valles, R. Alaiz-Rodriguez, A. Guerrero-Curieses, and J. Cid-Sueiro,
ML algorithms are typically driven by the needs of specific “Q-probabilistic routing in wireless sensor networks,” in Proc. Int. Conf. Intelligent Sensors,
Sensor Networks and Information, Melbourne, Dec. 3–6, 2007, pp. 1–6.
existing applications, such as Computer Vision (CV) and [13] P. Stone, “TPOT-RL applied to network routing,” in Proc. Int. Conf. Machine Learn-
Natural Language Processing (NLP). For example, convolu- ing, California, June 29–July 2, 2000, pp. 935–942.
[14] P. Stone and M. Veloso, “Team-partitioned, opaque-transition reinforcement learn-
tional neural networks are fascinating and powerful tools for ing,” in Robot Soccer World Cup, Springer, 1998, pp. 261–272.
image and audio recognition that can even achieve superhu- [15] D. Wolpert, K. Tumer, and J. Frank, “Using collective intelligence to route internet
traffic,” in Advances in Neural Information Processing Systems, Denver, Nov. 29–Dec. 4, 1999,
man performance on many tasks. However, the networking pp. 952–960.
domain involves completely different theoretical mathemati- [16] J. Dowling, E. Curran, R. Cunningham, and V. Cahill, “Using feedback in col-
laborative reinforcement learning to adaptively optimize MANET routing,” IEEE Trans.
cal models compared to those found in the fields of comput- Syst., Man, Cybern., vol. 35, no. 3, pp. 360–372, May 2005.
er vision and NLP. Convolutional layers or recurrent layers [17] A. Elwhishi, P. H. Ho, K. Naik, and B. Shihada, “ARBR: Adaptive reinforcement-
based routing for DTN,” in Proc. IEEE Int. Conf. Wireless and Mobile Computing, Network-
may not work effectively in the networking domain. In addi- ing and Communications, Ontario, Oct. 10–13, 2010, pp. 376–385.
tion, networks involve far more data and stringent response [18] G. Stampa et al., “A deep-reinforcement learning approach for software-defined net-
working routing optimization,” arXiv Preprint, arXiv:1709.07080, Sept. 2017.
time demands, which pose great challenges for ML deploy- [19] S.-C. Lin, I. F. Akyildiz, P. Wang, and M. Luo, “QoS-aware adaptive routing in
ment. Therefore, the demanding requirements and specific multi-layer hierarchical software defined networks: A reinforcement learning approach,”
in Proc. IEEE Int. Conf. Services Computing, San Francisco, June 27–July 2, 2016, pp.
characteristics of the networking domain will require both 25–33.
the adaptation of existing algorithms and the development of [20] P. Wang and T. Wang, “Adaptive routing for sensor networks using reinforcement
learning,” in Proc. IEEE Int. Conf. Computer and Information Technology, Seoul, Sept. 20–
new ones [7]. Thus, efforts to meet the needs of the net- 22, 2006, pp. 219–225.
working domain, as a new application domain for ML, will [21] H. Yao, X. Chen, M. Li, P. Zhang, and L. Wang, “A novel reinforcement learning
algorithm for virtual network embedding,” Neurocomputing, vol. 284, pp. 1–9, Apr. 2018.
drive advances in both the ML and networking domains to a [22] X. Liang, Y. Li, G. Han, H. Dai, and H. V. Poor, “A secure mobile crowdsensing
new level. [27] game with deep reinforcement learning,” IEEE Trans. Inf. Forensics Security, vol. 13, no.
1, pp. 35–47, Jan. 2018.
[23] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv
7. Conclusion Preprint, arXiv:1509.02971, Sept. 2015.
[24] S. Bock, J. Goppold, and M. Wei, “An improvement of the convergence proof of the
In this article, we first explored two deployment models for ADAM-optimizer,” arXiv Preprint, arXiv:1804.10587, Apr. 2018.
an intelligent control plane in a network and discussed the [25] K. Malialis, S. Devlin, and D. Kudenko, “Resource abstraction for reinforcement
learning in multiagent congestion problems,” in Proc. 2016 Int. Conf. Autonomous Agents
unique advantages and disadvantages of the centralized and and Multiagent Systems, Singapore, May 9–13, 2016, pp. 503–511.
distributed paradigms. Then, we proposed a hybrid ML para- [26] “Congestion-problem,” [Online]. Available: https://github.com/rradules/congestion
-problems
digm for packet routing, in which we combine distributed [27] H. Yao, C. Jiang, and Y. Qian, Developing Networks Using Artificial Intelligence. Spring-
AI routers with a centralized network mind to address the er, 2019.
[28] Y. Go, M. A. Jamshed, Y. Moon, C. Hwang, and K. Park, “APUNet: Revitalizing
needs of different network services. In our paradigm, we GPU as packet processing accelerator,” in Proc. USENIX Symp. Networked System Design
deploy a centralized AI control plane for tunneling-based and Implementation, Boston, Mar. 27–29, 2017, pp. 83–96.

routing and a hybrid AI architecture for hop-by-hop routing.

30 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | NOVEMBER 2019

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy