0% found this document useful (0 votes)
18 views9 pages

TMAE Updated V2

Next-generation communication networks are expected to exploit recent advances in data science and cutting-edge communications technologies to improve the utilization of the available communications resources. In this article, we introduce an emerging deep learning (DL) architecture, the transformermasked autoencoder (TMAE), and discuss its potential in nextgeneration wireless networks. We discuss the limitations of current DL techniques in meeting the requirements of 5G and beyond 5G networks,

Uploaded by

abdullah zayat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views9 pages

TMAE Updated V2

Next-generation communication networks are expected to exploit recent advances in data science and cutting-edge communications technologies to improve the utilization of the available communications resources. In this article, we introduce an emerging deep learning (DL) architecture, the transformermasked autoencoder (TMAE), and discuss its potential in nextgeneration wireless networks. We discuss the limitations of current DL techniques in meeting the requirements of 5G and beyond 5G networks,

Uploaded by

abdullah zayat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/375624578

Transformer Masked Autoencoders for Next-Generation Wireless


Communications: Architecture and Opportunities

Article in IEEE Communications Magazine · January 2023


DOI: 10.1109/MCOM.002.2300257

CITATION READS

1 92

4 authors:

Abdullah Zayat Mahmoud A. Hasabelnaby


University of British Columbia - Okanagan University of British Columbia
4 PUBLICATIONS 5 CITATIONS 15 PUBLICATIONS 52 CITATIONS

SEE PROFILE SEE PROFILE

Mohanad Obeed Anas Chaaban


Huawei Technologies Canada University of British Columbia
42 PUBLICATIONS 725 CITATIONS 238 PUBLICATIONS 3,668 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Mahmoud A. Hasabelnaby on 31 January 2024.

The user has requested enhancement of the downloaded file.


1

Transformer Masked Autoencoders for


Next-Generation Wireless Communications:
Architecture and Opportunities
Abdullah Zayat, Mahmoud A. Hasabelnaby, Graduate Student Member, IEEE, Mohanad Obeed, Member, IEEE,
Anas Chaaban, Senior Member, IEEE

Abstract—Next-generation communication networks are ex- such, optimizing communication schemes using mathematical
pected to exploit recent advances in data science and cutting-edge models becomes extremely challenging and computationally
communications technologies to improve the utilization of the complex, particularly with the integration of demanding 6G
available communications resources. In this article, we introduce
an emerging deep learning (DL) architecture, the transformer- applications and services, such as the Metaverse, ubiquitous
masked autoencoder (TMAE), and discuss its potential in next- Extended Reality (XR), intelligent connected robotics, large-
generation wireless networks. We discuss the limitations of scale intelligent reconfigurable surfaces (IRSs), and ultra-
current DL techniques in meeting the requirements of 5G and massive MIMO (UM-MIMO) networks.
beyond 5G networks, and how the TMAE differs from the Due to this, the use of deep learning (DL) approaches
classical DL techniques can potentially address several wireless
communication problems. We highlight various areas in next- has been proposed to solve wireless communications chal-
generation mobile networks which can be addressed using a lenges due to their ability to adapt to dynamic environments,
TMAE, including source and channel coding, estimation, and approximate complex models, and utilize data to improve
security. Furthermore, we demonstrate a case study showing performance [2]. Transformer-enabled DL, initially proposed
how a TMAE can improve data compression performance and for natural language processing (NLP) tasks [3], opens the
complexity compared to existing schemes. Finally, we discuss key
challenges and open future research directions for deploying the door for further advances in this area. The main advantage
TMAE in intelligent next-generation mobile networks. of transformers is their superior ability to learn complex
dependencies between input features compared to classical
Index Terms—6G, 5G, convolutional neural networks, deep
learning, wireless communication, recurrent neural networks, deep neural networks (DNN) [4]. The goal of this paper is
transformer, masked autoencoder. to discuss a transformer-based architecture, the transformer
masked autoencoder (TMAE), and its application in communi-
cations. Particularly, we start by highlighting some limitations
I. I NTRODUCTION of existing DNNs (Sec. II), then we explain the general trans-
Next-generation (NG) mobile networks are increasingly former and the MAE architectures (Sec. III). Next, we present
calling for intelligent architectures that support massive con- a use case where a TMAE enhances data compression by
nectivity, ultra-low latency, ultra-high reliability, high-quality exploiting semantics, which can significantly improve achiev-
of experience, high spectral and energy efficiency, and lower able rates in wireless networks (Sec. IV). Finally, we discuss
deployment costs [1]. One way to meet these stringent require- opportunities for improving wireless communications using
ments is to rethink traditional communication techniques by the TMAE (Sec. V) and summarize the takeaway message
exploiting recent advances in artificial intelligence. of the paper (Sec. VI).
Traditionally, functions such as waveform design, channel
estimation, interference mitigation, and error detection and II. C LASSICAL DNN L IMITATIONS IN NG N ETWORKS
correction are developed based on theoretical models and Recent advances in DL opened the possibility for designing
assumptions. This traditional approach is not capable of adapt- intelligent mobile networks that learn to operate optimally
ing to new challenges introduced by emerging technologies. using massive amounts of data [2], thus overcoming math-
For instance, the pilot-based channel estimation technique, ematical modeling and computational complexity challenges
while efficient for MIMO systems with a few antennas and [5]. Although DL offers advantages over mathematical models,
low-mobility users, is not efficient for massive multiple- it is not free of limitations. Some common DNNs are discussed
input multiple-output (MIMO) systems or high-mobility users. next, followed by their limitations in NG networks.
Additionally, the plethora of communication protocols, tech-
nologies, and services that have been introduced to support the
A. Common DNN Architectures
growing demand and diversity of use cases make it increas-
ingly difficult to mathematically model wireless networks. As The most commonly used DNN architectures in wireless
communications include the following:
A. Zayat, M. A. Hasabelnaby, and A. Chaaban are with the School of • Multi-Layer Perceptions (MLP): An MLP is a feed-
Engineering, University of British Columbia, Kelowna, BC V1V1V7, Canada.
M. Obeed is with the Systems and Computer Engineering Department, forward neural network (NN) that consists of at least
Carleton University, Ottawa, ON K1S 5B6, Canada. three layers of fully-connected nodes: an input layer,
2

a hidden layer, and an output layer. MLPs have been compared to classical DL [3]. Next, we present the basic ar-
proposed to tackle various mobile network problems, chitecture of transformers and discuss their potential compared
such as beamforming and channel estimation [5]. to classical DNNs. The potential of the TMAE, a transformer-
• Convolutional Neural Networks (CNN): CNNs replace based autoencoder, for NG networks, is discussed afterward.
fully-connected layers with locally connected kernels that
capture local correlations in data. This reduces the num- III. P OTENTIAL OF T RANSFORMER -BASED NN S
ber of model parameters which simplifies training and A transformer is an NN architecture proposed originally
reduces the risk of overfitting. As a result, CNNs often for NLP [3]. Owing to their remarkable ability to capture
outperform MLPs in many applications such as com- complex patterns and relationships in data, transformers have
puter vision (CV). CNNs have also been used in many been adapted for various applications, including CV and
wireless communications applications such as semantic wireless communications. This is because they have several
communications [2], beamforming, channel estimation, advantages over classical DNNs. First, transformers employ
and channel state information (CSI) feedback [5]. the attention mechanism, which allows them to dynamically
• Recurrent Neural Networks (RNN): The main difference weigh the importance of segments in the data, which allows
between a typical NN (MLP or CNN) and an RNN is them to capture short- and long-term dependencies in data.
that an RNN has feedback connections in addition to This is particularly beneficial in time series analysis where
feed-forward connections. These connections give RNNs extended sequences of data points are involved, enhancing
memory of previous inputs/outputs, which is useful for their capability of generalizing and adapting to changing
sequential processing and helps in exploiting local and environments. Second, transformers take advantage of paral-
global correlations. This makes them ideal for applica- lel processing, which significantly improves their efficiency.
tions with time-series data [5]. Finally, pre-trained transformer models, such as BERT [3],
• Long Short-Term Memory (LSTM) Networks: Although can be fine-tuned for customized objectives using a small
RNNs have memory, their memory is short. As a special dataset (realizing transfer learning). These advantages make
type of RNNs, LSTM networks feature gated memory transformers attractive for challenging DL tasks, including
cells which extends their memory to longer sequences. wireless communication applications. This section delves into
Due to this, LSTM networks have been used for channel the workings of transformers and examines their key compo-
estimation in channels with memory [5]. nents. It also examines the architecture of the TMAE.
B. Limitations in NG Networks
The aforementioned NNs can be employed to build en- A. Transformer Architecture
coding and decoding layers (an autoencoder), that produce a There is a variety of transformer architectures depending
different representation of data and reconstruct it from the new on the application. However, all architectures share the same
representation, respectively. This can then be used to realize fundamental principle: the attention mechanism. The main
physical layer processing tasks such as signal design, channel components of transformers, as shown in Fig. 1, are input
estimation, CSI feedback, modulation, and coding [5]. embedding, positional encoding, and multi-head attention,
However, classical DNNs encounter limitations in fully which are discussed next.
meeting the demands of NG networks. MLPs have limited Input Embedding: First, the input vector is segmented
ability to extract deep features from raw data, and their per- and projected into the embedding space (usually of higher
formance on sequential data is poor. This leads to challenges dimension than the input). This can be achieved using a single
in generalizing and transferring learned knowledge between convolutional layer for example. The result of this step is a
different scenarios, hindering their ability to seamlessly adapt representation in the embedding space of segments of the input
and perform in a dynamically changing environment which vector, each of which is characterized by a position.
is common in wireless networks. While CNNs, RNNs, and Positional Encoding: Since the transformer encoder has
LSTM networks are more adaptable spatially/temporally to no recurrence (unlike RNN), it is essential to add segment
local/short-term changes, they have limited capability to ef- position into the data. This is accomplished with positional
fectively exploit global/long-term dependencies in sequential encoding. There are numerous methods for positional encod-
data, which is crucial for capturing the intricate patterns and ing, one of which employs trigonometric functions. In this
dynamics inherent in NG wireless networks. In addition to case, each odd indexed segment is encoded using samples
this, training recurrent networks (RNN and LSTM) suffers of a cosine function with a frequency that depends on the
from challenges related to convergence, vanishing gradients, position, effectively encoding this positional information in
and parallelization, which limits their usefulness in latency- the generated vector. Similarly, samples of a sine function
sensitive applications. are used to encode the positions of even indexed segments.
These limitations underscore the need for innovative en- These positional encoding vectors are then added to the input
hancements, alternative architectures, and hybrid approaches embeddings of their respective segments.
to effectively address the evolving requirements of NG net- Multi-head Attention: Multi-head attention is the most
works. Recently, attention-based DL realized using trans- important component of transformers and plays a crucial role
former networks was proposed and shown to achieve remark- in quantifying the relationships between the inputs. This is
able performance gains in various CV and NLP applications achieved using the self-attention mechanism, which relates
3

Fig. 2. A multi-head attention block where the scaled dot-product attention


is applied h times.

Fig. 1. Transformer Architecture (reproduced from [3]).

the inputs with different positions in a sequence to provide


a comprehensive representation of that sequence. Let the re-
sulting vector obtained after position encoding of segment i ∈
{1, . . . , n} be denoted by a vector xi , and X = [x1 , . . . , xn ]. Fig. 3. Masked Auto-encoder Architecture: Encoding is performed on the
As shown in Fig. 2, three NNs are used to generate three small subset of visible patches. The masked portions of the image are added
matrices from X which are the Key K, the Query Q, and the after the encoder, and a decoder reconstructs the original image from the
complete set of encoded patches and mask tokens [7].
Value V, each column of which represents one segment. Then,
an attention map is generated by calculating the product QT K,
and applying a soft-max function to map the resulting values an example where the data is an image and the transformer-
to probabilities. The probabilities are then used as weights based architecture is a ViT. The TMAE consists of an encoder
multiplied by V to obtain a self-attended feature map for X. and a decoder.
To construct multi-head attention, this attention mechanism is TMAE encoder: The encoder in a TMAE is designed to
applied multiple times in parallel, and the resulting outputs are convert partial observations of the input data into a latent
concatenated and projected again to obtain the final result. The representation. It is realized using a transformer-based NN
rationale behind using multiple attention blocks is to enable applied on data with masked segments. The transformer-
the attention function to extract information from different based NN embeds the observed segments, adds their positional
perspectives (queries) and capture the complex relationships information, and then passes the result through multiple multi-
between segments. Residual connections and layer normaliza- head attention and DNN blocks as explained above (Fig. 1).
tion are used to improve stability and performance, and a feed- TMAE decoder: The TMAE decoder uses the latent rep-
forward NN is used to extract more complex features. resentation and the positions of the masked segments to
To attend to relations between previous transformer outputs reconstruct the original data as shown in Fig. 3.
and current inputs, a similar mechanism is applied to generate Note that the TMAE encoder is usually narrower than the
self-attended feature maps from previous outputs, and then TMAE decoder since the former operates on the observed
to generate an attention map between the current input and segments only whereas the latter operates on both the observed
previous outputs (right-most blocks in Fig. 1). Finally, the and masked ones. Also, the TMAE encoder is usually deeper
result is converted to a representation that depends on the task than the TMAE decoder because it needs to learn the semantics
at hand (translation, prediction, classification, etc.). and correlations of the data from the observed segments.
Based on this architecture, the authors of [6] proposed Hence, the encoder and decoder are asymmetric.
the so-called Vision Transformer (ViT) for CV applications. The potential of TMAE in NG networks is highly promising,
Several other transformer-based architectures were proposed offering a novel paradigm to address complex challenges. By
including the TMAE [7] discussed next. amalgamating the power of transformers with the reconstruc-
tion capabilities of an autoencoder, the TMAE framework
B. The TMAE Architecture can efficiently process sequential data and capture intricate
A TMAE is a transformer-based architecture that aims to temporal dependencies inherent in wireless channels. This
reconstruct data using only partial observations. Fig. 3 shows makes it applicable for enhancing resource efficiency, adapt-
4

ability, and robustness in tasks such as signal processing, TMAE-enhanced compression offers a distinct advantage.
channel estimation, and resource allocation, contributing to the Once an image is captured, the UAV segments it into non-
evolution of resilient and high-performance wireless networks. overlapping patches and randomly masks some of them,
relying on the ability of the BS to infer them from un-
C. Transformer Challenges masked patches due to the potentially high correlation between
The superiority of transformer-based NNs comes at the patches. Once the patches are masked and removed, the
cost of some challenges. Their parallel nature increases the remaining patches are stacked to form a condensed image,
resources needed for them to run. Transformers also require a which is then further compressed using a standard compression
substantial amount of labeled data for training, often necessi- algorithm. Upon receipt at the BS, the compressed image is
tating pre-training on large corpora before fine-tuning on the decompressed to retrieve the stacked patches, and the TMAE,
target task. Additionally, fine-tuning pre-trained transformers trained to handle random masks, reconstructs the masked
on specific tasks requires careful balancing to avoid catas- patches to reproduce the complete image. For accurate recon-
trophic forgetting and retain previously learned knowledge [8]. struction, the TMAE requires information about the location
Note however that some of these challenges are shared with of the masked patches. This mask can be obtained by sharing
classical DNNs. A comparison between classical DNNs and the seed of the random number generator used to generate the
transformer-based NNs is given in Table I, providing a concise mask (or can be stored at the receiver if it is deterministic).
overview of their strengths and limitations, which helps in While the resulting image might not be of the highest quality, it
understanding their suitability for different tasks. conveys essential and interpretable information from the UAV
In the next section, we provide a case study illustrating despite the communication challenges, as can be seen in Fig. 4.
how a TMAE enhances the compression rate of existing Next, we examine the proposed scheme quantitatively when
compression schemes. JPEG is used as the standard compression algorithm.

B. JPEG-TMAE Compression
IV. C ASE S TUDY: TMAE-E NHANCED C OMPRESSION
The proposed JPEG-TMAE compression scheme is a com-
In some communication applications, communicating nodes
bination of the widely-used JPEG compression and the TMAE,
have limited computational resources and are deployed in
as shown in Fig. 4. To implement this, we use patches
resource-constrained environments. Examples include Un-
of size 16 × 16 pixels (as an example). Then, we use a
manned Aerial Vehicle (UAV) and Internet of Things (IoT)
masking ratio of Rmask = 0.67 (i.e., 2/3 of the image is
applications. Yet, the demand for transmitting vast amounts of
masked) and random masking with a seed that is shared
data (whether it be images, sensor readings, or text) persists.
between the UAV and the BS. Thus, the compression gain
Given the limited resources in these systems, transmitting
achieved by the TMAE is the ratio between the size of the
vast amounts of data becomes challenging. Fortunately, such
original and the stacked patches (1 − Rmask ). The remaining
data often exhibits inherent correlations between its segments.
patches are stacked and compressed using JPEG. As such, the
For instance, in an image, certain patches can be inferred
overall compression rate equals the masking ratio multiplied
from their neighboring patches due to the visual structure and
by the JPEG compression rate. Upon receiving the compressed
semantics of the scene. Similarly, in a sequence of sensor
image, the BS decompresses it using JPEG, and then uses a
readings, some values can be predicted based on the preceding
pre-trained TMAE to reconstruct the image.
and succeeding values. Thus, one can omit certain segments of
Leveraging the pre-trained transformers-based model from
the data, leveraging the correlations to infer these segments at
Facebook AI Research (FAIR) [7], we bypassed extensive
the receiver’s end. This approach not only reduces the amount
training, focusing instead on architectural modifications tai-
of data to be transmitted but also requires less processing re-
lored to our compression goals. This strategic approach en-
sources at the transmitter, making it particularly advantageous
sured an optimal balance between computational efficiency
for systems like UAVs capturing and transmitting real-time
and performance in the described UAV communication sce-
images or IoT devices sending frequent sensor updates.
nario.
Fig. 4 shows a comparison where JPEG and the proposed
A. Example: UAV with TMAE-Enhanced Compression JPEG-TMAE schemes are used to compress an image, using
Consider a UAV operating in a distant, hard-to-reach re- a compression rate of 0.41 and 0.255 BPP for JPEG, and a
gion, tasked with gathering essential information about the compression rate of 0.255 BPP for the JPEG-TMAE scheme.
landscape through imaging. The primary challenge for this The JPEG scheme fails at the compression rate 0.255 BPP,
UAV is not merely capturing the images but efficiently trans- contrary to the proposed JPEG-TMAE which reconstructs
mitting them back to a base station (BS). In this scenario, the the image with a much better quality at the same overall
communication channel quality is poor, limiting the amount compression rate (a quality comparable with JPEG at 0.41
of information that can be sent. This requires compressing BPP).
the image aggressively, which can compromise its meaningful For a quantitative comparison, we can use the structural
interpretation. Conventional compression techniques may fail similarity measure (SSIM) to assess the quality of the recon-
in this scenario. Fig. 4 shows an example where a standard structed images [9]. SSIM gauges the similarity between the
compression scheme produces a very low-quality image when input and output images, factoring in the interpixel dependen-
the compression (in bits per pixel (BPP)) is low. cies of closely situated pixels. Recognized as a full reference
5

TABLE I
C OMPARATIVE E VALUATION OF D EEP L EARNING T ECHNIQUES BASED ON K EY C HARACTERISTICS

Aspect MLP CNN RNN LSTM Transformer


Architecture Fully Connected Convolutional Recurrent Recurrent Attention-based
Sequence Modeling No Partial Yes Yes Yes
Long-term Dependencies Limited Limited Yes Moderate Improved
Parallel Processing No Yes No No Yes
Spatial Invariance No Yes No No No
Parameter Sharing No Yes Yes Yes Yes
Interpretability Low Moderate Low Moderate Moderate
Computational Complexity Low Moderate Low High High
Memory and Storage Requirements Low Low Moderate High High

Transmitter Side Receiver Side

Standard Encoder Standard Decoder

JPEG JPEG

Encoder Decoder

Proposed Encoder Proposed Decoder

Captured

Image JPEG JPEG ViT ViT

Encoder Decoder
Encoder Decoder

Base Station
Masking Layer

Fig. 4. Remote UAV imaging: A qualitative comparison between the conventional JPEG compression and the proposed JPEG-TMAE compression scheme.

metric, SSIM measures image quality using an uncompressed JPEG-TMAE seamlessly integrates the entire NN onto the BS,
or noise-free initial input image as its reference. Thus, we thus simplifying processing at the UAV. Note that while the
examine the SSIM of the proposed JPEG-TMAE on the Kodak ConvLSTM method performs best at higher compression rates,
dataset [10] (often used to compare compression methods). we expect a similar transformer–based architecture to perform
Through a methodical process of iterating the masking ratio even better.
and adjusting the JPEG compression rate for every iteration, In summary, this approach is highly suitable for commu-
we were able to delineate an optimized performance spectrum nication applications with limited resources at the transmitter
across all examined masking ratios. We compare with sev- side (UAVs, IoT, etc.). Note that the results in Fig. 5 can be
eral leading models, specifically mbt2018 (CNN-Based) [11], further improved using dynamic masking to ensure sufficient
cheng2020-anchor (CNN-Based) [12], and ConvLSTM (CNN- correlation between masked/unmasked patches, leading to
LSTM Based) [13]. either a lower compression rate or better image quality.
Fig 5 displays the SSIM in relation to the overall com-
pression rate. Notably, the JPEG-TMAE not only outperforms
JPEG, particularly at lower compression rates, but it also V. A PPLICATIONS IN NG N ETWORKS
exhibits superior performance compared to leading models
such as mbt2018, cheng2020-anchor, and ConvLSTM at these As demonstrated above, the TMAE can be used efficiently
rates. In the realm of moderate compression rates, it matches to reduce the size of an image and reconstruct it using its
the performance of these models. An added advantage is its semantics. This improves source coding and consequently
design approach: while the aforementioned methods are based increases the throughput of wireless communications. In this
on autoencoders and thus require part of the DNN architecture section, we discuss applications of the TMAE in various areas
to be incorporated at the UAV—increasing its complexity, our of wireless communication.
6

0.5 for reducing pilot signal transmission overhead due to its


0.45 superiority in reconstructing signals from partial observations.
0.4 For example, in OFDM, the channel time-frequency re-
0.35 sponse can be envisioned as a 2D image (with colors rep-
0.3 resenting the real and imaginary parts) [15]. The TMAE can
be exploited for reducing the number of estimated channels,
SSIM

0.25 by estimating the channels of some time-frequency resource


blocks, and inferring the rest using the TMAE by treating them
0.2
as masked patches in the 2D image. This idea can also be ap-
plied to multiple antenna systems where the channels’ spatial,
0.15 JPEG temporal, and spectral characteristics need to be estimated.
JPEG-TMAE (Optimized)
ConvLSTM (CNN-LSTM Based) [13]
Another example is channel estimation for reconfigurable
cheng2020-anchor(CNN based) [12] intelligent surfaces (RIS), which are surfaces that can be
0.1
mbt2018(CNN Based) [11] mounted on buildings to reflect signals in a desired direction
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 and improve channel quality. Despite their advantages, they
Bit per pixel (BPP) require large channel estimation overheads. A TMAE can be
used to reduce this overhead, by estimating parts of the RIS
Fig. 5. Comparison of the JPEG-TMAE compression scheme with conven-
tional JPEG compression and state-of-the-art models on the Kodak dataset. channel matrices, and exploiting the spatial correlation in these
matrices to reconstruct the rest of the channel using a TMAE.

A. Semantic Source and Channel Coding C. Privacy and Security


In Sec. IV, we present an application of the TMAE for Data collection is a major concern for NG networks.
image compression. In general, the source coding benefits For applications involving DL, collaborative learning in the
of TMAEs extend beyond image compression. The proposed cloud provides a means for processing large amounts of data
TMAE-enhanced compression can be extended to video data collected from distributed nodes. However, this comes at
for instance. However, for video data, the masks will have the expense of privacy when the application involves using
both space and time dimensions. Similarly, TMAEs can be sensitive data or when the trained model is shared. Intelligent
combined with standard compression schemes as in Sec. IV NG networks should adopt privacy-by-design approaches that
to improve the compression of text or audio data. This source are service-oriented and privacy-preserving. Several privacy
coding potential can increase the throughput of NG networks. challenges can be addressed using a TMAE to build generative
In addition to source coding, TMAEs can be beneficial in models to construct privacy-preserving datasets, where data
channel coding. Autoencoders in general have been used in privacy and utility can be ensured simultaneously. For instance,
the literature for various transmission processing applications a TMAE can be used to encrypt user data before it is
including channel coding and modulation [2]. As an autoen- transmitted over the network, so that it is only accessible to
coder with attention, the TMAE can be superior in such tasks. authorized users who have access to the appropriate mask.
In particular, the attention mechanism gives TMAEs an edge A TMAE can also be useful for detecting potential for
in joint source-channel coding applications [14]. Also, their attacks on the network infrastructure itself. It can be used
ability to learn long-term dependencies can be beneficial for to analyze network traffic and detect anomalies or suspicious
communications over changing communication environments. patterns with higher accuracy than classical DNNs. In general,
Combining TMAEs with various NN architectures or classical by leveraging the power of the TMAE, more effective and
coding schemes (LDPC, polar codes) to achieve superior efficient solutions for securing networks and protecting users’
performance is a promising research direction. data can be developed.

VI. C HALLENGES AND O PPORTUNITIES OF TMAE IN NG


B. Channel Estimation and Prediction N ETWORKS
One stringent requirement in NG networks is to guaran- Although the integration of TMAE into NG networks holds
tee extremely low latency. One way to achieve this is to transformative promise, it also brings to the forefront a series
shrink the packet size to reduce the decoding transmission of multifaceted challenges and open problems. These intri-
and computation time. However, the pilot sequences used cacies arise from the combined complexity of transformer
for channel estimation increase the packet size. In addition, models and dynamic wireless networks. In this section, we
in dynamic environments where users are highly mobile, discuss the current challenges and future research directions
the channels change rapidly which requires more frequent of TMAE in NG networks.
pilot signal transmission. Researchers used NNs for channel • Computational resources: Although the TMAE benefits
estimation/prediction to mitigate this problem. However, the from parallel processing to achieve remarkable perfor-
complexity and dynamicity of wireless channels limit the mance, this comes at the cost of computational resources
capability of existing NNs to improve channel estimation (Graphics Processing Units (GPUs)). In the case study
performance. We envision the TMAE to be a strong candidate above, we benefited from the ability to migrate the TMAE
7

processing to the BS which can be equipped with a GPU. VII. C ONCLUSION


However, if the processing has to take place at a resource- In this article, we discussed the limitations of using clas-
constrained device for latency considerations (such as sical deep-learning methods in wireless networks. We then
an autonomous vehicle), this will require this device to presented the architecture of transformers and masked au-
be equipped with expensive hardware (computational re- toencoders and discussed their distinct capabilities compared
sources and memory) which may not always be practical. to traditional deep-learning methods. We also showed an
The lack of such hardware will reduce the performance application of transformer-masked autoencoders in data com-
or increase the latency of TMAEs, which limits their pression, which yielded a significant improvement compared
use in real-time applications. Resolving this limitation to classical approaches. We explored and presented some
requires innovative techniques for distributing the TMAE applications and open research problems, where transformer-
between devices and edge clouds (federated transformer- masked autoencoder-based solutions can be developed to pro-
based NN). Moreover, like other DNNs, the scalability of duce intelligent wireless communication systems. In general,
the TMAE for use in large-scale communication networks it is expected that transformer-based neural networks will play
presents challenges in terms of computational resources. an important role in next-generation communication networks,
• Energy consumption: In mobile and battery-operated and there are many challenges and opportunities in this area
wireless devices, energy efficiency is a critical concern. for the research community to explore.
The computational complexity and memory requirements
of TMAEs (and more generally transformer-based NNs) R EFERENCES
can lead to high energy consumption, making them
[1] Y. Xu, G. Gui, H. Gacanin, and F. Adachi, “A survey on resource
less energy-efficient for resource-constrained devices. allocation for 5G heterogeneous networks: Current research, future
For energy-critical wireless communication networks, re- trends, and challenges,” IEEE Commun. Surveys Tuts, vol. 23, no. 2,
searchers need to explore model compression techniques, pp. 668–695, 2021.
[2] Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, “Application of
data quantization methods, and hardware acceleration to machine learning in wireless networks: Key techniques and open issues,”
reduce energy consumption while maintaining satisfac- IEEE Commun. Surveys Tuts, vol. 21, no. 4, pp. 3072–3108, 2019.
tory performance. [3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. 31st
• Pre-trained models: Pre-trained transformers have Int. Conf. Neural Info. Process. Sys., ser. NIPS’17. Red Hook, NY,
shown extraordinary performance in CV and NLP, USA: Curran Associates Inc., 2017, p. 60006010.
and excellent adaptability between different applications [4] X. Chen, Y. Hu, Z. Dong, P. Zheng, and J. Wei, “Transformer operating
state monitoring system based on wireless sensor networks,” IEEE
therein. It is expected that this property of transformers Sensors J., vol. 21, no. 22, pp. 25 098–25 105, 2021.
transfers well to wireless communications applications. [5] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical
However, existing pre-trained transformer models (trained layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp.
93–99, 2019.
on images or text) are not optimized for wireless com- [6] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
munications data, and their fine-tuning on such data T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,
may not provide the best performance. On the other J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words:
Transformers for image recognition at scale,” in Int. Conf. Learning
hand, transformer models that are trained on wireless Representations, virtual, 2021.
communications data do not exist nowadays. There is a [7] K. He, X. Chen, S. Xie, Y. Li, P. Dollr, and R. Girshick, “Masked
great opportunity for researchers to develop pre-trained autoencoders are scalable vision learners,” in 2022 IEEE/CVF Conf.
Computer Vision and Pattern Recognition, 2022, pp. 15 979–15 988.
transformer models that are trained on wireless commu- [8] M. Li, P. Ling, S. Wen, X. Chen, and F. Wen, “Bubble-wave-mitigation
nications data (such as channels and resource allocation algorithm and transformer-based neural network demodulator for water-
data), and to demonstrate the adaptability of these models air optical camera communications,” IEEE Photonics J., pp. 1–10, 2023.
[9] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality
within different applications in wireless communications assessment: from error visibility to structural similarity,” IEEE Trans.
with minimal training requirements. This also highlights Image Process., vol. 13, no. 4, pp. 600–612, 2004.
another opportunity, which is the development of a large [10] Kodak, “Kodak lossless true color image suite,” http://r0k.us/graphics/
kodak/, 1999.
dataset of wireless communications data for such pur- [11] D. Minnen, J. Ballé, and G. Toderici, “Joint autoregressive and hierarchi-
poses. The availability of diverse and abundant data from cal priors for learned image compression,” in Annual Conf. Neural Inf.
wireless communications applications will enable effi- Processing Systems (NeurIPS), 3-8 December 2018, Montréal, Canada,
2018, pp. 10 794–10 803.
cient training and improved generalization within wireless [12] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image com-
communications applications. pression with discretized gaussian mixture likelihoods and attention
modules,” in Proceedings of the IEEE Conf. Computer Vision and
Pattern Recognition (CVPR), 2020.
[13] G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen,
J. Shor, and M. Covell, “Full resolution image compression with
In summary, incorporating TMAE into future wireless recurrent neural networks,” CoRR, vol. abs/1608.05148, 2016. [Online].
Available: http://arxiv.org/abs/1608.05148
communication systems holds unparalleled potential, but ad- [14] Y. Jiang, H. Kim, H. Asnani, S. Kannan, S. Oh, and P. Viswanath,
dressing these intertwined challenges and open problems is “Joint channel coding and modulation via deep learning,” in IEEE 21st
fundamental to their seamless deployment. By collectively Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC),
Aug. 2020, pp. 1–5.
tackling these intricacies, the wireless communication commu- [15] M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep
nity can usher in a new era of adaptive, efficient, and secure learning-based channel estimation,” IEEE Commun. Lett., vol. 23, no. 4,
communication networks. pp. 652–655, 2019.
8

Abdullah Zayat (S’21) received B.S. degree in


electrical engineering from King Fahd University of
Petroleum and Minerals (KFUPM), Saudi Arabia, in
2021. Between 2019 and 2020, he was a Research
Intern at King Abdullah University of Science and
Technology (KAUST). He then obtained his M.A.Sc.
degree in electrical engineering from the University
of British Columbia (UBC), Canada, in 2023. Cur-
rently, he is a PhD student in Electrical Engineer-
ing at UBC. His research interests focus on signal
processing, communications, and the application of
machine/deep learning in communications.

Mahmoud A. Hasabelnaby (S’20) received the B.S.


(Hons.) and M.S. degrees in electronics and elec-
trical communications engineering from Menoufia
University, Menouf, Egypt, in 2014 and 2019, re-
spectively. He is currently working toward a Ph.D.
degree in electrical engineering at the University
of British Columbia, Okanagan Campus, Kelowna,
Canada. He is on leave from the Faculty of Elec-
tronic Engineering, Menoufia University, Menouf,
Egypt. From 2016 to 2018, he was a Research Assis-
tant at the National Telecommunications Regularity
Authority, Egypt. His research interests include wireless communications,
information theory, ML/AI, end-to-end cloud-native software development,
and next-generation wireless access networks.

Mohanad Obeed received the B.Eng. degree in


computer and communication engineering from Taiz
University, Taiz, Yemen, in 2008, the M.Sc. and
the Ph.D. degree in electrical engineering from
King Fahd University of Petroleum and Minerals
(KFUPM), Dhahran, Saudi Arabia, in 2016 and
2019, respectively. From July 2017 to July 2019,
he was a visiting researcher at King Abdullah Uni-
versity of Science and Technology (KAUST) under
the supervision of Mohamed-Slim Alouini. He was
a Postdoctoral Research Fellow with the School of
Engineering at the University of British Columbia, Canada, from 2019 to
2022. He joined Carleton University, as a postdoctoral research fellow, in
2023. His research interests include satellite communication, 5G and 6G
networks, channel estimation, deep learning, and federated learning.

Anas Chaaban (S’09 - M’14 - SM’17) received


the Maı̂trise ès Sciences degree in electronics from
Lebanese University, Lebanon, in 2006, the M.Sc.
degree in communications technology and the Dr.
Ing. (Ph.D.) degree in electrical engineering and
information technology from the University of Ulm
and the Ruhr-University of Bochum, Germany, in
2009 and 2013, respectively. From 2008 to 2009,
he was with the Daimler AG Research Group On
Machine Vision, Ulm, Germany. He was a Research
Assistant with the Emmy-Noether Research Group
on Wireless Networks, University of Ulm, Germany, from 2009 to 2011, which
relocated to the Ruhr-University of Bochum in 2011. He was a PostDoctoral
Researcher with the Ruhr-University of Bochum from 2013 to 2014, and with
King Abdullah University of Science and Technology from 2015 to 2017. He
joined the School of Engineering at the University of British Columbia as
an Assistant Professor in 2018. His research interests are in the areas of
information theory and wireless communications.

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy