0% found this document useful (0 votes)
15 views16 pages

Sensors 22 06248 v3

Uploaded by

Ali Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views16 pages

Sensors 22 06248 v3

Uploaded by

Ali Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

sensors

Article
Sparse Sliding-Window Kernel Recursive Least-Squares
Channel Prediction for Fast Time-Varying MIMO Systems
Xingxing Ai 1 , Jiayi Zhao 2 , Hongtao Zhang 2 and Yong Sun 2,∗

1 ZTE Corporation, Algorithm Department, Wireless Product R&D Institute,


Wireless Product Operation Division, Shenzhen 518057, China
2 Key Laboratory of Universal Wireless Communications, Ministry of Education of China,
Beijing University of Posts and Telecommunications, Beijing 100876, China
* Correspondence: sunyong@bupt.edu.cn

Abstract: Accurate channel state information (CSI) is important for MIMO systems, especially in a
high-speed scenario, fast time-varying CSI tends to be out of date, and a change in CSI shows complex
nonlinearities. The kernel recursive least-squares (KRLS) algorithm, which offers an attractive
framework to deal with nonlinear problems, can be used in predicting nonlinear time-varying CSI.
However, the network structure of the traditional KRLS algorithm grows as the training sample
size increases, resulting in insufficient storage space and increasing computation when dealing with
incoming data, which limits the online prediction of the KRLS algorithm. This paper proposed a new
sparse sliding-window KRLS (SSW-KRLS) algorithm where a candidate discard set is selected through
correlation analysis between the mapping vectors in the kernel Hilbert spaces of the new input
sample and the existing samples in the kernel dictionary; then, the discarded sample is determined
in combination with its corresponding output to achieve dynamic sample updates. Specifically, the
proposed SSW-KRLS algorithm maintains the size of the kernel dictionary within the sample budget
requires a fixed amount of memory and computation per time step, incorporates regularization, and
Citation: Ai, X.; Zhao, J.; Zhang, H.; achieves online prediction. Moreover, in order to sufficiently track the strongly changeable dynamic
Sun, Y. Sparse Sliding-Window characteristics, a forgetting factor is considered in the proposed algorithm. Numerical simulations
Kernel Recursive Least-Squares
demonstrate that, under a realistic channel model of 3GPP in a rich scattering environment, our
Channel Prediction for Fast
proposed algorithm achieved superior performance in terms of both predictive accuracy and kernel
Time-Varying MIMO Systems.
dictionary size than that of the ALD-KRLS algorithm. Our proposed SSW-KRLS algorithm with
Sensors 2022, 22, 6248. https://
M = 90 achieved 2 dB NMSE less than that of the ALD-KRLS algorithm with v = 0.001, while the
doi.org/10.3390/s22166248
kernel dictionary was about 17% smaller when the speed of the mobile user was 120 km/h.
Academic Editor: Peter Han Joo Chong

Received: 10 June 2022


Keywords: channel prediction; time-varying channels; MIMO system; kernel methods; recursive
Accepted: 13 August 2022 least squares
Published: 19 August 2022

Publisher’s Note: MDPI stays neutral


with regard to jurisdictional claims in
1. Introduction
published maps and institutional affil-
iations. Multiantenna technology can fully use spatial dimension resources and dramatically
improve the capacity of wireless communication systems without increasing transmission
power and bandwidth [1]. Meanwhile, beamforming technology [2,3] is widely used
to reduce the interference between cochannel users with cooperative transmission and
Copyright: © 2022 by the authors. reception since it can compensate channel fading and distortion caused by the multipath
Licensee MDPI, Basel, Switzerland. effect. Base stations optimize the allocation of radio resources through reasonable precoding,
This article is an open access article rendering the desired signal and interference more orthogonal provided that CSI is known
distributed under the terms and at the base station. Thus, the acquisition of CSI is very important in the cooperation of
conditions of the Creative Commons transmission and reception [4]. However, due to the dynamics of the channel, especially
Attribution (CC BY) license (https://
when the terminal is moving at a high speed, the acquisition of CSI is a formidable problem
creativecommons.org/licenses/by/
in MIMO systems.
4.0/).

Sensors 2022, 22, 6248. https://doi.org/10.3390/s22166248 https://www.mdpi.com/journal/sensors


Sensors 2022, 22, 6248 2 of 16

In TDD systems, the user sends a sounding reference signal (SRS), and the base station
performs channel estimation algorithms such as LS [5] and MMSE [6]. Then, the obtained
CSI is used for downlink beamforming to realize cooperative signal processing. The
coherence time of wireless channels is the time duration after which CSI is considered to be
outdated. When the terminal is moving at a high speed, the Doppler frequency shift grows,
and the time variability of the channel is severe, which leads to the shortening of channel
coherence time. The measured uplink channel CSI cannot represent the real channel state
of downlink slots, resulting in the mismatch between the downlink beamforming designed
according to the measured CSI and the actual channel. In [1], with a typical CSI delay of
4 milliseconds, the user terminal speed at 30 km/h led to as much as 50% performance
reduction versus in the low-mobility scenario at 3 km/h.
In order to overcome the performance degradation caused by severe time-varying
channels, [7] proposed a tractable user-centric CSI model and a robust beamforming design
by taking deterministic equivalents [8] into account. However, when the user terminal is at
high speed, the channel shows nonstationary characteristics, and the statistical characteris-
tics of the channel also change with time, which cannot be used for the beamforming of
high-speed time-varying channels.
Another approach is to use a channel prediction algorithm [9–16] to obtain more
accurate CSI for beamforming. The kernel method is widely used in channel prediction
algorithms due to its ability to track nonlinear channels, and its adaptation to time-varying
channels [17–22]. However, algorithms based on kernel methods face the problem in online
prediction that the networks structure grows as the size of the training samples grows in
time. Though researchers have proposed sparseness methods to limit the size of samples
by setting prerequisites for new samples added into the dictionary, the size of the kernel
dictionary cannot be precisely controlled. Therefore, this paper proposes a new channel
algorithm based on the kernel method that maintains the size of the kernel dictionary within
a fixed budget while precisely tracking the fast time-varying dynamic characteristics.

1.1. Related Work


State-of-the-art channel fading prediction algorithms were reviewed in [9]. Parametric
radio channel-based methods [10] consider channel changes faster than multipath parame-
ters, and the estimation of these parameters can help in the extrapolation of the channel
into future. However, the effective time of static multipath parameters is inversely pro-
portional to the terminal moving speed, rendering the channel prediction based on radio
parameters not appropriate for high-speed scenarios. Autoregressive (AR) model-based
methods [11,13] do not explicitly model physical scatterings, considering the time-varying
channel as a stochastic wide-sense stationary (WSS) process and the temporal autocorre-
lation function is used for prediction. Nevertheless, they are not capable of predicting
ephemeral variations in non-wide-sense-stationary (NWSS) channels due to their linear
correlation assumption. When treating NWSS channels, many studies attempted to use
machine learning in channel prediction. The authors in [14] proposed a backpropagation
(BP) framework regarding channel prediction for backscatter communication networks
that considers both spatial and frequency diversity. In [15], the authors developed a
machine-learning method for predicting a mobile communication channel on the basis of
a specific type of convolutional neural network. Although these algorithms can achieve
good performance, they need to build neural networks, require a large number of samples
for training, and the complexities of these algorithms are high. The channel prediction
method based on a support vector machine [16,23] uses Mercer’s theorem [24] to map
the channel sample space to the high-dimensional space, and performs linear regression
in the high-dimensional space to solve the problem of tracking nonlinear channels well.
The kernel recursive least-squares (KRLS) [17–22] algorithm is a nonlinear version of the
recursive least-squares (RLS) algorithm. It can not only solve nonlinear problems, but also
adaptively iterate the model parameters.
Sensors 2022, 22, 6248 3 of 16

In order to solve this problem and render the online kernel method feasible, researchers
proposed various sparseness methods or criteria, such as approximate linear dependency
(ALD) [17], the novelty criterion (NC) [20], the surprise criterion (SC) [21], and the coherence
criterion (CC) [22]. On the basis of these sparseness methods or criteria, only new input
samples that meet the prerequisites are added to the dictionary. However, these sparse
methods cannot precisely control the size of the kernel dictionary, which motivated us to
introduce a sliding window to control the size of the kernel dictionary. Recently, some KRLS-
based algorithms with an inserted forgetting factor have achieved better performance than
that of QKRLS algorithms [25,26] and ALD-KRLS, which motivated us to insert a forgetting
factor into the proposed SSW-KRLS algorithm. This paper proposes a new sparse sliding-
window KRLS algorithm where a candidate discard set is selected through correlation
analysis between the mapping vectors in the kernel Hilbert spaces of the new input sample
and the existing samples in the kernel dictionary; then, the discarded sample is determined
in combination with its corresponding output to achieve dynamic sample updates.

1.2. Contributions
The main contributions of this paper are summarized as follows:
• We propose a novel sparse sliding-window KRLS algorithm. To precisely control
the size of samples, we introduce a sample budget as a size restriction. When the
dictionary was smaller than the sample budget, we directly added the new sample
to the dictionary. Otherwise, we chose the best sample to discard according to our
proposed new criterion.
• To differentiate the sample value collected at different times, we introduced a forget-
ting matrix. By setting different forgetting values for samples collected at different
times, we quantified the time value of the samples. The older sample had a smaller for-
getting value, which means that its time value was smaller. In this way, we considered
both the correlation of samples and the time value when discarding old samples.
• Regarding our new method for discarding old samples, we set a candidate set where
we decided which sample to discard. The candidate set was obtained by adding the
samples that had larger kernel functions of the new sample than a threshold and were
highly correlated with the new sample. Then, we conducted a weighted estimation of
the output value of these samples. We decided which sample to discard on the basis
of the deviation between its output and the estimated value.

2. System Model
We considered two typical user movement scenarios: urban road where the moving
speed of users is 60 km/h, and a highway where the moving speed of users is 120 km/h.
As shown in Figure 1, in TDD systems, the user sends a SRS, and the base station runs
channel estimation algorithms such as LS and MMSE. The channel matrix is first estimated
in frequency-domain; then, the channel frequency response is transformed into the time-
domain by IDFT. The noise is evenly distributed in the whole time domain and is easy to
be eliminated using a raised cosine window. Therefore, the effect of noise is very small
compared to the user’s high mobility. Due to the channel reciprocity in TDD systems,
uplink channel CSI can be directly used in the design of downlink beamforming to realize
cooperative signal processing. However, nonstationary fast fading characteristics of mobile
environments bring challenges to multi-input multi-output (MIMO) communications.
Sensors 2022, 22, 6248 4 of 16

Uplink Reference Signal

Downlink Beamforming

Figure 1. In TDD mobile communication systems, the beamforming performance of high-speed


mobile terminal worsens. The user sends a SRS, and the base station runs channel estimation
algorithms and performs beamforming. However, due to the user’s movement, the terminal can only
obtain sidelobe gain in the downlink slots.

When the terminal is moving at a high speed, the Doppler frequency shift grows, and
the time variability of the channel is severe. The measured uplink channel CSI cannot
represent the real channel state of the downlink slots, resulting in performance degradation,
and a mismatch between the measured CSI and the actual channel. As shown in Figure 2,
during the SRS reporting cycle, the user can be regarded as moving in a fixed direction
at a fixed speed. In two adjacent time slots, due to the short moving distance of the user,
the amplitude and phase of the direct and scattered components have a certain correlation.
When the user speed is low, an AR-based channel prediction method can achieve good
performance by utilizing the linear correlation between adjacent time slots. However, when
the user is moving at a high speed, the channel correlation between adjacent time slots
presents nonlinear characteristics. Kernel methods are needed to exploit the nonlinear
correlation characteristics of the channel.

D D D S U U D D D

correlate
Scatterer

v
D Slot S Slot

LoS component Scattering component


Figure 2. In TDD mobile communication systems, the interval between slots S and D is short in two
adjacent SRS periods, and the LoS and scattering components are correlated.
Sensors 2022, 22, 6248 5 of 16

The user sends SRS in special time slots, and the BS performs the channel measurement
and estimation to obtain the channel matrix. As shown in Figure 3, the channel matrix
measured by BS in the i-th SRS period is assumed to be Hi ∈ C Nr × Nt . Nt and Nr represent
the number of antennas of the BS and the mobile UE, respectively. The real and imaginary
parts are separately processed in the subsequent algorithm, and H represents a real matrix
in the later representation. The prediction order means that each channel matrix is related
to the channel matrix measured in the previous-order SRS periods, so the input vector of
the prediction system is
ui = [ Hi , Hi+1 , . . . , Hi+order−1 ]. (1)
There are some complex and unknown dependencies between ui and Hi : Hi+order−1 =
f (ui ) = f ([ Hi , Hi+1 , . . . , Hi+order−1 ]) that need to be exploited by kernel methods.

D D D S U U D D D

One frame, T=10ms

One subframe, T=1ms

One slot, T=0.5ms

DD D S UU D D D S UU
measurement and estimation

Hi H i1 H i2 H iorder 1 H iorder

ui
u i1
Figure 3. An illustration of the input and output pairs in a channel prediction module.

3. Traditional KRLS Algorithm


In this section, we introduce the traditional KRLS algorithm and several extensions to
the KRLS algorithm.

3.1. Traditional KRLS Algorithm


Assume a set of ordered input-output pairs D = {ui , yi }|im=1 , where m is the total
number of samples, ui ∈ Rm are m-dimensional input vectors and yi ∈ R is the output. We
call D a dictionary that records the input-output pairs collected before the i-th time slot.
First, according to the Mercer theorem, we can adopt a nonlinear function ϕ(·) to transform
data ui ∈ Rm into a high-dimensional feature space: ϕ(·) :Rm → Rc . The corresponding
kernel function is κ (u, v) = h ϕ(u), ϕ(v)i. We need to minimize the cost function:

i 2

2
J= y i − wi ϕ j = Φ i T wi − y i , (2)
j =1

where Φi = [ ϕ(u1 ), ..., ϕ(ui )] = [ ϕ1 , . . . , ϕi ] is a high-dimensional matrix. By minimizing


the cost function as (2), we can obtain weight wi = Φi† yi , where Φi† is the pseudoinverse
of Φi . Φi† cannot be solved directly because kernel function κ (u, v) is a high-dimensional
Sensors 2022, 22, 6248 6 of 16

mapping, and the exact mapping function is unknown. To avoid overfitting when the
samples are small, we used L2 regularization, so the cost function is reformulated as:
2
J = Φ i T wi − y i + λ k wi k2 . (3)

∂J
By letting ∂wi = 0, we can obtain that

wi = Φi [λI + ΦiT Φi ]−1 yi = Φi α(i ). (4)

By substituting (4) into (3), the cost function can be reformulated as:
2
J = Φi T Φi α (i ) − yi + λ k wi k2 = k Ki α ( i ) − y i k2 + λ k wi k2 , (5)

where Ki is the kernel matrix and the element located at the i-th row and j-th column of Ki is
Ki (i, j) = κ ui , u j . The problem is to solve

α(i ) = [λI + ΦiT Φi ]−1 yi = [λI + Ki ]−1 yi = Q(i )yi , (6)

Then, the inverse of Q(i ) can be obtained as:


" #
−1 Q ( i − 1 ) −1 ki
Q(i ) = λI + Ki = , (7)
ki T λ + κ ( ui , ui )

where k i = Φi−1 T ϕi . Thus, Q(i ) can be solved using the inversion of the partitioned matrix.
" #
T
Q ( i − 1 ) r ( i ) + z ( i ) z ( i ) − z ( i )
Q ( i ) = r ( i ) −1 , (8)
− z (i ) T 1

where z(i ) = Q(i − 1)k i and r (i ) = λ + ϕi T ϕi − z(i )−1 ϕi .


The traditional KRLS algorithm is summarized in Algorithm 1.

Algorithm 1 Traditional KRLS algorithm.


1: Initialize Q(1) = (λ + k(u1 , u1 ))−1 and α(1) = Q(1)y1 .
2: Iterate for i > 1: k i = Φi−1 T ϕi , z(i ) = Q(i − 1)k i , r (i ) = λ + ϕi T ϕi − z(i )−1 ϕi ,
" #
T
−1 Q ( i − 1 ) r ( i ) + z ( i ) z ( i ) − z (i )
Q (i ) = r (i )
− z (i ) T 1

By relying on the kernel trick, the traditional KRLS algorithm can deal with nonlinear
problems by nonlinearly transforming data into a high-dimensional reproducing kernel
Hilbert space, which is similar to other techniques such as support vector machines (SVMs).
Compared to SVMs, it avoids large-scale high-dimensional computing through iterative
computations. However, the traditional KRLS algorithm grows linearly with the number
of processed data, rendering growing complexities for each consecutive update if no
additional measures are taken. In order to render online kernel algorithms feasible, growth
is typically slowed down by approximately representing the solution using only a subset
of bases that are considered to be relevant according to a chosen criterion. In our proposed
SSW-KRLS algorithm, we took similar operations to avoid an increase in computation,
which is discussed in detail in Section 4.
Sensors 2022, 22, 6248 7 of 16

3.2. Extensions to KRLS Algorithm


The kernel recursive least-squares method is one of the most efficient online kernel
methods, and it achieves good performance in nonlinear fitting and prediction. However,
the bottleneck problem is that the network structure grows with the training samples,
which leads to insufficient memory and computational complexities when processing
continuously incoming signals. In order to solve this problem and render the online kernel
method feasible, researchers have proposed various sparseness methods or criteria, such as
approximate linear dependency (ALD), the novelty criterion (NC), the surprise criterion
(SC) and the coherence criterion (CC). On the basis of these sparseness methods or criteria,
only new input samples that meet the prerequisites are used as training samples. Thus, the
growth of network structure is effectively slowed down. Table 1 shows several common
sparseness criteria.

Table 1. Some common sparseness criteria.

Criterion Indicators Handling Method


Determine whether the kernel function of
the new sample can be linearly represented If δ(n) < τ,
by the kernel function of the existing discard new samples.
ALD samples in the dictionary: If δ(n) ≥ τ,
m 2 add the new sample
δ(n) = min ∑ a(i ) ϕ( xi ) − ϕ( xn ) to the dictionary.
a i =1
Calculate the minimal distance between If dis < τ,
the new and existing discard new samples.
NC samples in the kernel dictionary: If dis ≥ τ,
m 2
add the new sample
dis = min ∑ a(i ) ϕ( xi ) − ϕ( xn )
a i =1
to the dictionary.
According to information theory,
based on prior joint Gaussian
If τ1 < S(n) < τ2 ,
distribution, the amount of information
add the new sample to the
SC brought by the new sample is:
dictionary. If S(n) > τ2 ,
S(n) = − ln p( xn , dn | Dn−1 ),
discard new samples.
where p( xn , dn | Dn−1 ) is posterior probability
distribution of ( xn , dn )
If µ > τ,
Calculate the maximal kernel function of discard new samples.
CC the new and existing samples : If µ < τ,
µ = max|k ( xi , xn )| add the new sample
i
to the dictionary.

The performance of the above methods in filtering samples largely depends on the
selected threshold. As time goes by, the number of samples increases slowly and lastly
stabilizes, while the update of model parameters tends to be slow. This is not conducive to
updating time-varying channels. Furthermore, when the user moves into a new environ-
ment, the outdated samples are reserved, which is not conducive to channel prediction.
An effective way to keep the dictionary updated while precisely controlling its size
is using the sliding window. A simple implementation is discarding the oldest sample
directly each time a new sample is collected, but it lacks the judgement of correlation
between samples. Sparseness simply based on time information is unreliable and may
result in unstable prediction performance when the window is small. In order to solve this
problem, we propose a new algorithm, SSW-KRLS, where we take both the time value and
correlation between samples into consideration.
Sensors 2022, 22, 6248 8 of 16

4. Proposed SSW-KRLS Algorithm


Assume that there is a set of input–output pairs D = {ui , yi }|iL=1 where L is the total
number of samples. ui ∈ Rm are m-dimensional input vectors, and yi ∈ R is the output.
In the traditional KRLS algorithm, in each time slot, when a new sample comes, the size
dictionary increases, and this leads to an increase in computation and memory requirements.
To solve this problem, we used a sliding-window approach to keep the size of the kernel
dictionary within a fixed budget. Our criterion for discarding old samples was based on
the correlation between existing samples in the kernel dictionary and the new sample.
Moreover, we introduced a forgetting factor to exponentially weigh older data by scaling
them, so as to track the dynamic characteristics of the channel.
In our proposed SSW-KRLS algorithm, we solved the following least-squares cost function:

i  2
min ∑ βi− j y j − w T ϕ j + λB(i )w T w, (9)
w
j =1

where β is the forgetting factor, and i is the iteration number, yi = (y1 , . . . , yi ) T is the
L−1 , β L−2 , . . . , 1) is the forgetting matrix,
 w is the weight vector, B(i ) = diag( β
output vector,
ϕ j = ϕ u j is the transformation of u j , and λ is the regularization parameter. The optimal
w∗ is solved:   −1
w∗ = λB(i ) + Φi B(i )ΦiT Φ i B ( i ) yi . (10)

We reformulate the equation:


_
w∗ = Φi [λB(i ) + B(i )Ki ]−1 yi , (11)
_
where yi = B(i )yi is the exponentially weighted input signal.
The solutions for Q(i ) = [λB(i ) + B(i )Ki ]−1 are different under two cases in the i-th
iteration: one case is that the size of the dictionary increased, and the other is that the size
of the dictionary remained unchanged. Cases I and II are discussed in Sections 4.2 and 4.3,
respectively. In Section 4.1, we introduce how to update our dictionary when a new sample
comes. Our methods are different depending on whether the size of the dictionary reaches
the fixed sample budget, which is denoted by M. In particular, when the dictionary was
not full, we discarded an old sample on the basis of the correlation analysis between the
new and existing samples in the dictionary in order to slow down the increase in dictionary
size. In the case when the size of the dictionary was full, we set a candidate discard set
which contains the samples highly correlated with the new sample, and determine which
sample to discard on the basis of their corresponding output. The whole process of our
proposed algorithm is shown in Figure 4.

estimated CSI  
H iorder  output CSI prediction
Hˆ iorder   (i )k (i )T

input-output pair calculate the prediction error


u i  H i order e(i )  H iorder  Hˆ iorder

update the intermediate matrix


update Kernel Dictionary
 (i )  Q(i )y(i )
1
  Q(i  1)1  B(i  1)k (i ) 
Q(i )  
D(i )   D1 , D2 , , Dn   k (i )
T 
k (u(i), u(i))   

Figure 4. An illustration of the channel prediction algorithm steps.

4.1. How to Optimally Discard an Old Sample


The cosine value is usually adopted for judging the correlation of two vectors. In the
KRLS algorithm, on the other hand, the cosine value is calculated in the kernel Hilbert
Sensors 2022, 22, 6248 9 of 16

spaces. Suppose ui is a new sample, and u j is an existing sample in the dictionary.  Let k(u j )
denote the kernel vector of u j ∈ D. k (u j ) = {k jn }n6=i,j , where k jn = κ u j , un . To measure
the correlation between the existing and new samples, we calculated the cosine value
 k ( ui ) k ( u j ) T
of k (ui ) and k u j for ∀u j ∈ D, as cos k(ui ), k(u j ) = . When the size of the
|k(ui )||k(u j )|
dictionary did not reach the budget, we found a sample with highest correlation with the
new sample. Existing sample u j that had the highest cosine value was the most probable
to be discarded, where j = arg max cos k(ui ), k(u j ) . We set a threshold τ and discarded
j
sample u j if cos k (ui ), k (u j ) ≥ τ. The updated dictionary is:
 
 D (i − 1)\u j , ui , i f max

j
cos k(ui ), k(u j ) ≥ τ
D (i )= (12)
 [ D ( i − 1) , u i ],
 i f max cos k (ui ), k(u j ) < τ
j

In another case, when the size of the dictionary reaches the fixed budget, one sample
must be discarded from the kernel dictionary in each time slot. In our strategy, we first
set a candidate discard set on the basis of the correlation with the new sample, and then
determined the optimal discarded sample according to its output value.
Candidate discard set S was composed of the samples that had high correlation with
the new sample. Specifically, an existing sample in dictionary u j ∈ D was added into S
if κ (ui , u j ) > ε, where ui is a new sample and ε is a threshold. Among the samples in
S, the one with the smallest value for prediction may be one that has the most similar
information with the new sample or the one that is untypical with little probability to
occur. We determined the discarded sample in combination with its corresponding output
as following.
Suppose that the candidate discard set is S = {u1 , u2 , . . . , un }. The weighted average
n
of the output values can be obtained as y = ∑ w j y j where
j =1

κ ( ui , u j )
wj = n (13)
∑ κ ( ui , u j )
j =1

and n is the number of samples in S.


The deviation between the output value of u j ∈ S and the weighted average output
value is e j = y j − y . The sample with maximal deviation emax is not typical and may
have occurred with a small probability, while the one with minimal deviation had similar
information to that of the new sample. Suppose that the sample with the maximal deviation
is u jmax , and the sample with the minimal deviation is u jmin . We chose one of these two
samples to discard according to the production of emax and emin . If the production of emax
and emin was larger than threshold τ, this indicated that emax was large enough, so we
discarded u jmax ; otherwise, emin was small enough, and we discarded u jmin . The updated
dictionary is: ( 
D (i − 1)\u jmax ui , i f emax emin ≥ τ
D (i )=   (14)
D (i − 1)\u jmin ui , i f emax emin < τ.

4.2. Case I: The Size of D (i ) Is Changed


As mentioned before, the prediction process depends on whether the size of the
dictionary has increased or not. In Case I, when the size of dictionary increased, we could
obtain from (11) that:
w = Φi α (i )

 i

_
α (i ) = Q (i ) yi (15)
−1


Q(i ) = [λB(i ) + B(i )Ki ] .
Sensors 2022, 22, 6248 10 of 16

With the combination of B(i ) and yi , we could obtain exponentially weighted input
signal ȳi . We can find that
 
βB(i − 1) 0
B (i ) = (16)
0T 1
 
Ki −1 h (i )
Ki = (17)
h (i ) T κ ( ui , ui )
 
βyi¯−1
ȳi = , (18)
ȳi

where h(i ) = [κ (u1 , ui ), κ (u2 , ui ), . . . , κ (ui−1 , ui )] T . Thus

βB(i − 1)Ki−1 βB(i − 1)h(i )


 
B ( i ) Ki = (19)
h (i ) T κ ( ui , ui )

By substituting (19) into (15), we can obtain that


" # −1
βQ(i − 1)−1 βB(i − 1)h(i )
Q (i ) = (20)
h (i ) T κ ( ui , ui ) + λ

With partitioned matrix inversion, we can obtain Q(i ) in recursive form:


" #
β −1 r ( i ) Q ( i − 1 ) + β −1 z ( i ) z ( i ) T − z ( i )
Q ( i ) = r ( i ) −1 B B
, (21)
− β −1 z ( i ) T 1

where 
z ( i ) = Q ( i − 1) B ( i − 1) h ( i )
 B


r (i ) = κ ( ui , ui ) + λ − h (i ) T z B (i ) (22)

 z ( i ) = Q ( i − 1) T h ( i )

α ( i − 1 ) − z B ( i ) r ( i ) −1 e ( i )
 
α (i ) = , (23)
r ( i ) −1 e ( i )
and e(i ) = yi − h(i ) T α(i − 1) is the prediction error in the ith time slot.

4.3. Case II: The Size of D (i ) Is Unchanged


In the case when the size of the dictionary did not change, the information of the
discarded sample u j∗ in Q(i − 1) needed to be deleted. The information of u j∗ lay in the
j∗ th column and j∗ th row of Q(i − 1). In order to not influence the update of the matrix,
we moved the j∗ th column and j∗ th row of Q(i − 1) into the first row and the first column,
and obtained Q̂(i − 1). Correspondingly, we applied the same transformation to Ki−1 and
obtained K̂ (i − 1). For Q(i − 1) = [λB(i − 1) + B(i − 1)Ki−1 ]−1 ; after the movement, the
jth column that meets j < j∗ should be multiplied by β.
Suppose that the matrix after removing the first column and first row of Q̂(i − 1)
and K̂ (i − 1) is Q̃(i − 1) and K̃ (i − 1), respectively. We can obtain its inversion Q̃(i − 1)−1
according to Appendix B,
e fT
 
Q̂(i − 1) = (24)
l G

Q̃(i − 1)−1 = G − lfT /e. (25)


Sensors 2022, 22, 6248 11 of 16

New matrix Q(i ) can be formulated into a partitioned matrix:


" # −1  −1
β Q̃(i − 1)−1

βB(i − 1)Ki A b
Q (i ) = = . (26)
k (i ) T κ ( ui , ui ) + λ pT κ ( ui , ui ) + λ
Then, Q(i ) can be obtained using the partitioned matrix:
h i−1  A−1 ( I + bp T A−1 g) − A−1 bg

i
Q(i ) = λβ I (i ) + B(i )Ki = , (27)
− p T A −1 g g

where A−1 = β−1 K̃ (i − 1)−1 B(i − 1)−1 , g = (κ (ui , ui ) + λ − p T A−1 b)−1 .


Then, the weight coefficient is updated:

α(i ) = Q(i )ȳi = Q(i ) B(i )yi , (28)

where yi is composed of the output value of the samples yi = [y1 , y2 , . . . , yi ] T . Lastly, we


can obtain the prediction value for the next time slot as ŷi+1 = α(i )k (i ) T .

5. Performance Evaluation
Based on the analysis above, we present algorithmic steps as Algorithm 2 and in this
section we show the simulation results of our proposed SSW-KRLS algorithmcompare
its performance to that of the ALD-KRLS algorithm as a baseline. The basic simulation
parameters are listed in Table 2. We adopted a 3D urban macro scenario, and considered
a typical setting of 3 kHz with 30 kHz of subcarrier spacing. We considered 20 MHz of
bandwidth that contained 51 resource partitions. Our adopted channel model was a CDL-A
channel model, and the DL precoder was RZF.

Algorithm 2 Proposed SSW-KRLS algorithm


1: Initialize Q(1) = (λ + κ (u1 , u1 ))−1 , B(1) = [1], α(1) = Q(1)y1 , D (1) = [u1 ].
2: Step 1: Iterate for i > 1: judging the number of samples in D (i ), which is L. If L < M,
perform Step 2; otherwise, perform Step 3.
3: Step 2: For each sample u j in D (i − 1), compute cos k (ui ), k (u j ) .
Find j∗ = arg max cos k(ui ), k(u j ) .
j
If max cos k(ui ), k (u j ) > τ, discard u j∗ . Then, add new sample ui into the dictionary.
j
Turn to Step 4.
4: Step 3: Construct candidate discard set S. Suppose S = {u1 , u2 , . . . , un }. Then, calculate
the output of the samples.
For each sample u j , calculate e j = y j − y . Find jmax = arg max e j and jmin = arg min e j .
j j
 
If emin emax > τ, D (i ) =  D (i − 1)\u jmax , ui .
If emin emax ≤ τ, D (i ) = D (i − 1)\u jmin , ui .
Turn to Step 4.
5: Step 4: If D (i ) is larger than D (i − 1), perform Step 5; otherwise, perform Step 6.
6: Step 5: Calculate Q(i ) according to (21) and then calculate intermediate matrix z B (i ),
r (i ), z(i ) according to (22). Calculate α(i ) according to (23). The prediction value for
the next time slot is ŷi+1 = α(i )k (i ) T .
7: Step 6: For Q(i − 1) moving the j∗ th column and j∗ th row into the first column and the
first row, and obtain Q̃(i − 1); calculate Q̃(i − 1)−1 by (24) and (25).
Update Q(i ) according to (26) and (27). Then, the prediction value is obtained on the
basis of (28).
Sensors 2022, 22, 6248 12 of 16

Table 2. Some common sparseness criteria.

Scenario 3D Urban Macro (3D UMa)


Carrier frequency 3 kHz
Subcarrier spacing 30 kHz
Bandwidth 20 MHz
Channel model CDL-A
Delay spread 100 ns
DL precoder RZF
order 5

Figure 5 shows the normalized mean squared error (NMSE) performance of differ-
ent algorithms. We show the performance of the ALD-KRLS and SSW-KRLS algorithms
with the 60 and 120 km/h velocity levels for all UEs. When the UE speed was 60 km/h,
the prediction algorithms was more accurate than with a UE speed of 120 km/h. Re-
garding the performance of the SSW-KRLS algorithm, we set different sample budgets:
M = 30, 90, 150. The algorithm performed better with a higher sample budget. The SSW-
KRLS algorithm with sample budget M = 30 performed better than the ALD-KRLS algo-
rithm with v = 0.001, and the SSW-KRLS algorithm with sample budget M = 90 greatly
outperformed the ALD-KRLS algorithm. However, the performance was insignificantly
improved when changing the sample budget from 90 to 150.

0
SSW-KRLS M=30
SSW-KRLS M=90
-2
v=120km/h SSW-KRLS M=150
ALD-KRLS v=0.0005
-4 ALD-KRLS v=0.0001

-6
Nmse(dB)

-8
v=60km/h
-10

-12

-14

-16
0 100 200 300 400 500
Iteration
Figure 5. NMSE comparison of the proposed SSW-KRLS algorithm and the ALD-KRLS algorithm.
β = 0.97, Nt = 64, Nr = 4.

Figure 6 shows the kernel dictionary size with 400 iterations. The kernel dictionary
size for all algorithms first grew and then slowed down; the size for SSW-KRLS remained
unchanged. The SSW-KRLS algorithm with sample budget M = 90 outperformed the
ALD-KRLS algorithm with v = 0.001, while the former used fewer samples. This indicates
the superiority of our proposed SSW-KRLS algorithm.
Sensors 2022, 22, 6248 13 of 16

160
SSW-KRLS M=30
SSW-KRLS M=90
140 SSW-KRLS M=150
ALD-KRLS v=0.0005
120 ALD-KRLS v=0.0001

Kernel Dictionary Size 100

80

60

40

20

0
0 50 100 150 200 250 300 350 400
Iteration
Figure 6. The kernel dictionary size of the proposed SSW-KRLS algorithm and ALD-KRLS algorithm
varies with the number of iterations. β = 0.97, Nt = 64, Nr = 4.

Figures 7 and 8 show the mean rate of mobile users under different algorithms at the
speed of 60 and 120 km/h, respectively. Comparing the figure at two speed levels shows
that the user with the lower speed had a higher mean rate. In addition, the performance
could be enhanced greatly with the use of the channel prediction algorithm. Particularly,
our proposed SSW-KRLS algorithm with sample budgets M = 150 and M = 90 achieved
better performance than that of the ALD-KRLS algorithm with v = 0.0001; for our proposed
SSW-KRLS algorithm, a higher sample budget brought about better performance. Moreover,
users with more antennas showed better performance under any circumstances.

22
no prediction
20 SSW-KRLS, M=150
SSW-KRLS, M=90
18 ALD-KRLS, v=0.0001

16
Mean Rate (bps/Hz)

14

12

10

8
N=128
6

4 N=16

0
-5 0 5 10 15 20 25
SNR (dB)

Figure 7. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS
algorithm, and the proposed SSW-KRLS algorithm. β = 0.97, v = 60 km/h.
Sensors 2022, 22, 6248 14 of 16

18
no prediction
16 SSW-KRLS, M=150
SSW-KRLS, M=90
ALD-KRLS, v=0.0001
14

Mean Rate (bps/Hz)


12

10

8
N=128

4
N=16
2

0
-5 0 5 10 15 20 25
SNR (dB)

Figure 8. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS
algorithm, and the proposed SSW-KRLS algorithm. β = 0.97, v = 120 km/h.

6. Conclusions
This paper proposed a new sparse sliding-window KRLS algorithm where a candidate
discard set is selected through correlation analysis between the mapping vectors in kernel
Hilbert spaces of the new input sample and the existing samples in the kernel dictionary.
It then determines the discarded sample in combination with its corresponding output
to achieve dynamic sample updates. Specifically, the proposed SSW-KRLS algorithm,
which maintains the size of the kernel dictionary within the sample budget, requires a
fixed amount of memory and computation per time step, incorporates regularization, and
achieves online predictions. Moreover, in order to sufficiently track strongly changeable
dynamic characteristics, a forgetting factor was considered in the proposed algorithm.
Numerical simulations demonstrated that, under a realistic channel model of 3GPP in a
rich scattering environment, our proposed algorithm achieved superior performance in
terms of both predictive accuracy and kernel dictionary size than that of the ALD-KRLS
algorithm. The NMSE for the channel prediction of the SSW-KRLS algorithm with M = 90
was about 2 dB lower than that of the ALD-KRLS algorithm with v = 0.001, while the
kernel dictionary was 17% smaller.

Author Contributions: Investigation, X.A. and J.Z.; methodology, X.A.; project administration, X.A.;
software, X.A.; resources, H.Z.; writing—original draft preparation, J.Z.; writing—review and editing,
H.Z.; funding acquisition, X.A. and Y.S. All authors have read and agreed to the published version of
the manuscript.
Funding: This research was funded by the ZTE Corporation Research Program.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A
For a non-singular matrix A, if there is a new column and a new row added to the
matrix and get E as as followings:
 
A b
E= . (A1)
bT c
Sensors 2022, 22, 6248 15 of 16

Suppose the inversion of E is formulated as:


 
D e
E −1 = . (A2)
eT f

Then the E−1 can be calculated by solving:


 T
 AD + be = I

Ae + b f = 0 , (A3)

 T
b e + cf = 1

where E−1 can be obtained as:


" #
A−1 I + bb T A−1H f − A −1 b f

−1
E = T , (A4)
− A −1 b f f
 −1
where f = c − b T A−1 b .

Appendix B
For a non-singular matrix E, if the first column and the first row are removed from the
matrix and get C:
a bT
 
E= . (A5)
b C
Suppose the inversion of E is formulated as:

eT
 
−1 d
E = . (A6)
e F

Then the C −1 can be calculated by solving:


(
bd + Ce = 0
, (A7)
be T + CF = I

where C −1 can be obtained as C −1 = F − ee T /d.

References
1. Li, M.; Collings, I.B.; Hanly, S.V.; Liu, C.; Whiting, P. Multicell Coordinated Scheduling with Multiuser Zero-Forcing Beamforming.
IEEE Trans. Wirel. Commun. 2016, 15, 827–842. [CrossRef]
2. Yin, H.; Wang, H.; Liu, Y.; Gesbert, D. Addressing the Curse of Mobility in Massive MIMO with Prony-Based Angular-Delay
Domain Channel Predictions. IEEE J. Sel. Areas Commun. 2020, 38, 2903–2917. [CrossRef]
3. Li, X.; Jin, S.; Suraweera, H.A.; Hou, J.; Gao, X. Statistical 3-D Beamforming for Large-Scale MIMO Downlink Systems over Rician
Fading Channels. IEEE Trans. Commun. 2016, 64, 1529–1543. [CrossRef]
4. Sapavath, N.N.; Rawat, D.B.; Song, M. Machine Learning for RF Slicing Using CSI Prediction in Software Defined Large-Scale
MIMO Wireless Networks. IEEE Trans. Netw. Sci. Eng. 2020, 7, 2137–2144. [CrossRef]
5. Huang, C.; Liu, L.; Yuen, C.; Sun, S. Iterative Channel Estimation Using LSE and Sparse Message Passing for MmWave MIMO
Systems. IEEE Trans. Signal Process. 2019, 67, 245–259. [CrossRef]
6. Bellili, F.; Sohrabi, F.; Yu, W. Generalized Approximate Message Passing for Massive MIMO mmWave Channel Estimation With
Laplacian Prior. IEEE Trans. Commun. 2019, 67, 3205–3219. [CrossRef]
7. Wu, D.; Zhang, H. Tractable Modelling and Robust Coordinated Beamforming Design with Partially Accurate CSI. IEEE Wirel.
Commun. Lett. 2021, 10, 2384–2387. [CrossRef]
8. Lu, A.; Gao, X.; Xiao, C. Free Deterministic Equivalents for the Analysis of MIMO Multiple Access Channel. IEEE Trans. Inf.
Theory 2016, 62, 4604–4629. [CrossRef]
9. Wu, C.; Yi, X.; Zhu, Y.; Wang, W.; You, L.; Gao, X. Channel Prediction in High-Mobility Massive MIMO: From Spatio-Temporal
Autoregression to Deep Learning. IEEE J. Sel. Areas Commun. 2021, 39, 1915–1930. [CrossRef]
Sensors 2022, 22, 6248 16 of 16

10. Chen, M.; Viberg, M. Long-Range Channel Prediction Based on Nonstationary Parametric Modeling. IEEE Trans. Signal Process.
2009, 57, 622–634. [CrossRef]
11. Liu, L.; Feng, H.; Yang, T.; Hu, B. MIMO-OFDM Wireless Channel Prediction by Exploiting Spatial-Temporal Correlation. IEEE
Trans. Wirel. Commun. 2014, 13, 310–319. [CrossRef]
12. Lv, C.; Lin, J.-C.; Yang, Z. Channel Prediction for Millimeter Wave MIMO-OFDM Communications in Rapidly Time-Varying
Frequency-Selective Fading Channels. IEEE Access 2019, 7, 15183–15195. [CrossRef]
13. Yuan, J.; Ngo, H.Q.; Matthaiou, M. Machine Learning-Based Channel Prediction in Massive MIMO with Channel Aging. IEEE
Trans. Wirel. Commun. 2020, 19, 2960–2973. [CrossRef]
14. Zhao, J.; Tian, H.; Li, D. Channel Prediction Based on BP Neural Network for Backscatter Communication Networks. Sensors
2020, 20, 300. [CrossRef]
15. Ahrens, J.; Ahrens, L.; Schotten, H.D. A Machine Learning Method for Prediction of Multipath Channels. ZTE Commun. 2019, 17,
12–18.
16. Sanchez-Fernandez, M.; de-Prado-Cumplido, M.; Arenas-Garcia, J.; Perez-Cruz, F. SVM multiregression for nonlinear channel
estimation in multiple-input multiple-output systems. IEEE Trans. Signal Process. 2004, 52, 2298–2307. [CrossRef]
17. Engel, Y.; Mannor, S.; Meir, R. The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 2004, 52, 2275–2285.
[CrossRef]
18. Liu, W.; Park, I.; Wang, Y.; Principe, J.C. Extended Kernel Recursive Least Squares Algorithm. IEEE Trans. Signal Process. 2009, 57,
3801–3814.
19. Guo, J.; Chen, H.; Chen, S. Improved Kernel Recursive Least Squares Algorithm Based Online Prediction for Nonstationary Time
Series. IEEE Signal Process. Lett. 2020, 27, 1365–1369. [CrossRef]
20. Platt, J. A Resource-Allocating Network for Function Interpolation. Neural Comput. 1991, 3, 213–225. [CrossRef]
21. Liu, W.; Park, I.; Principe, J.C. An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters. IEEE Trans.
Neural Netw. 2009, 20, 1950–1961. [CrossRef] [PubMed]
22. Richard, C.; Bermudez, J.C.M.; Honeine, P. Online Prediction of Time Series Data with Kernels. IEEE Trans. Signal Process. 2009,
57, 1058–1067. [CrossRef]
23. Yang, M.; Ai, B.; He, R.; Huang, C.; Ma, Z.; Zhong, Z.; Wang, J.; Pei, L.; Li, Y.; Li, J. Machine-Learning-Based Fast Angle-of-Arrival
Recognition for Vehicular Communications. IEEE Trans. Veh. Technol. 2021, 70, 1592–1605. [CrossRef]
24. Cherkassky, V.; Mulier, F.M. Statistical Learning Theory. In Learning from Data: Concepts, Theory, and Methods, 1st ed.; John Wiley &
Sons: Hoboken, NJ, USA, 2007.
25. Vaerenbergh, S.V.; Lazaro-Gredilla, M.; Santamaria, I. Kernel Recursive Least-Squares Tracker for Time-Varying Regression. IEEE
Trans. Neural Netw. Learn. Syst. 2012, 23, 1313–1326. [CrossRef]
26. Xiong, K.; Wang, S. The Online Random Fourier Features Conjugate Gradient Algorithm. IEEE Signal Process. Lett. 2019, 26,
740–744. [CrossRef]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy