Sensors 22 06248 v3
Sensors 22 06248 v3
Article
Sparse Sliding-Window Kernel Recursive Least-Squares
Channel Prediction for Fast Time-Varying MIMO Systems
Xingxing Ai 1 , Jiayi Zhao 2 , Hongtao Zhang 2 and Yong Sun 2,∗
Abstract: Accurate channel state information (CSI) is important for MIMO systems, especially in a
high-speed scenario, fast time-varying CSI tends to be out of date, and a change in CSI shows complex
nonlinearities. The kernel recursive least-squares (KRLS) algorithm, which offers an attractive
framework to deal with nonlinear problems, can be used in predicting nonlinear time-varying CSI.
However, the network structure of the traditional KRLS algorithm grows as the training sample
size increases, resulting in insufficient storage space and increasing computation when dealing with
incoming data, which limits the online prediction of the KRLS algorithm. This paper proposed a new
sparse sliding-window KRLS (SSW-KRLS) algorithm where a candidate discard set is selected through
correlation analysis between the mapping vectors in the kernel Hilbert spaces of the new input
sample and the existing samples in the kernel dictionary; then, the discarded sample is determined
in combination with its corresponding output to achieve dynamic sample updates. Specifically, the
proposed SSW-KRLS algorithm maintains the size of the kernel dictionary within the sample budget
requires a fixed amount of memory and computation per time step, incorporates regularization, and
Citation: Ai, X.; Zhao, J.; Zhang, H.; achieves online prediction. Moreover, in order to sufficiently track the strongly changeable dynamic
Sun, Y. Sparse Sliding-Window characteristics, a forgetting factor is considered in the proposed algorithm. Numerical simulations
Kernel Recursive Least-Squares
demonstrate that, under a realistic channel model of 3GPP in a rich scattering environment, our
Channel Prediction for Fast
proposed algorithm achieved superior performance in terms of both predictive accuracy and kernel
Time-Varying MIMO Systems.
dictionary size than that of the ALD-KRLS algorithm. Our proposed SSW-KRLS algorithm with
Sensors 2022, 22, 6248. https://
M = 90 achieved 2 dB NMSE less than that of the ALD-KRLS algorithm with v = 0.001, while the
doi.org/10.3390/s22166248
kernel dictionary was about 17% smaller when the speed of the mobile user was 120 km/h.
Academic Editor: Peter Han Joo Chong
In TDD systems, the user sends a sounding reference signal (SRS), and the base station
performs channel estimation algorithms such as LS [5] and MMSE [6]. Then, the obtained
CSI is used for downlink beamforming to realize cooperative signal processing. The
coherence time of wireless channels is the time duration after which CSI is considered to be
outdated. When the terminal is moving at a high speed, the Doppler frequency shift grows,
and the time variability of the channel is severe, which leads to the shortening of channel
coherence time. The measured uplink channel CSI cannot represent the real channel state
of downlink slots, resulting in the mismatch between the downlink beamforming designed
according to the measured CSI and the actual channel. In [1], with a typical CSI delay of
4 milliseconds, the user terminal speed at 30 km/h led to as much as 50% performance
reduction versus in the low-mobility scenario at 3 km/h.
In order to overcome the performance degradation caused by severe time-varying
channels, [7] proposed a tractable user-centric CSI model and a robust beamforming design
by taking deterministic equivalents [8] into account. However, when the user terminal is at
high speed, the channel shows nonstationary characteristics, and the statistical characteris-
tics of the channel also change with time, which cannot be used for the beamforming of
high-speed time-varying channels.
Another approach is to use a channel prediction algorithm [9–16] to obtain more
accurate CSI for beamforming. The kernel method is widely used in channel prediction
algorithms due to its ability to track nonlinear channels, and its adaptation to time-varying
channels [17–22]. However, algorithms based on kernel methods face the problem in online
prediction that the networks structure grows as the size of the training samples grows in
time. Though researchers have proposed sparseness methods to limit the size of samples
by setting prerequisites for new samples added into the dictionary, the size of the kernel
dictionary cannot be precisely controlled. Therefore, this paper proposes a new channel
algorithm based on the kernel method that maintains the size of the kernel dictionary within
a fixed budget while precisely tracking the fast time-varying dynamic characteristics.
In order to solve this problem and render the online kernel method feasible, researchers
proposed various sparseness methods or criteria, such as approximate linear dependency
(ALD) [17], the novelty criterion (NC) [20], the surprise criterion (SC) [21], and the coherence
criterion (CC) [22]. On the basis of these sparseness methods or criteria, only new input
samples that meet the prerequisites are added to the dictionary. However, these sparse
methods cannot precisely control the size of the kernel dictionary, which motivated us to
introduce a sliding window to control the size of the kernel dictionary. Recently, some KRLS-
based algorithms with an inserted forgetting factor have achieved better performance than
that of QKRLS algorithms [25,26] and ALD-KRLS, which motivated us to insert a forgetting
factor into the proposed SSW-KRLS algorithm. This paper proposes a new sparse sliding-
window KRLS algorithm where a candidate discard set is selected through correlation
analysis between the mapping vectors in the kernel Hilbert spaces of the new input sample
and the existing samples in the kernel dictionary; then, the discarded sample is determined
in combination with its corresponding output to achieve dynamic sample updates.
1.2. Contributions
The main contributions of this paper are summarized as follows:
• We propose a novel sparse sliding-window KRLS algorithm. To precisely control
the size of samples, we introduce a sample budget as a size restriction. When the
dictionary was smaller than the sample budget, we directly added the new sample
to the dictionary. Otherwise, we chose the best sample to discard according to our
proposed new criterion.
• To differentiate the sample value collected at different times, we introduced a forget-
ting matrix. By setting different forgetting values for samples collected at different
times, we quantified the time value of the samples. The older sample had a smaller for-
getting value, which means that its time value was smaller. In this way, we considered
both the correlation of samples and the time value when discarding old samples.
• Regarding our new method for discarding old samples, we set a candidate set where
we decided which sample to discard. The candidate set was obtained by adding the
samples that had larger kernel functions of the new sample than a threshold and were
highly correlated with the new sample. Then, we conducted a weighted estimation of
the output value of these samples. We decided which sample to discard on the basis
of the deviation between its output and the estimated value.
2. System Model
We considered two typical user movement scenarios: urban road where the moving
speed of users is 60 km/h, and a highway where the moving speed of users is 120 km/h.
As shown in Figure 1, in TDD systems, the user sends a SRS, and the base station runs
channel estimation algorithms such as LS and MMSE. The channel matrix is first estimated
in frequency-domain; then, the channel frequency response is transformed into the time-
domain by IDFT. The noise is evenly distributed in the whole time domain and is easy to
be eliminated using a raised cosine window. Therefore, the effect of noise is very small
compared to the user’s high mobility. Due to the channel reciprocity in TDD systems,
uplink channel CSI can be directly used in the design of downlink beamforming to realize
cooperative signal processing. However, nonstationary fast fading characteristics of mobile
environments bring challenges to multi-input multi-output (MIMO) communications.
Sensors 2022, 22, 6248 4 of 16
Downlink Beamforming
When the terminal is moving at a high speed, the Doppler frequency shift grows, and
the time variability of the channel is severe. The measured uplink channel CSI cannot
represent the real channel state of the downlink slots, resulting in performance degradation,
and a mismatch between the measured CSI and the actual channel. As shown in Figure 2,
during the SRS reporting cycle, the user can be regarded as moving in a fixed direction
at a fixed speed. In two adjacent time slots, due to the short moving distance of the user,
the amplitude and phase of the direct and scattered components have a certain correlation.
When the user speed is low, an AR-based channel prediction method can achieve good
performance by utilizing the linear correlation between adjacent time slots. However, when
the user is moving at a high speed, the channel correlation between adjacent time slots
presents nonlinear characteristics. Kernel methods are needed to exploit the nonlinear
correlation characteristics of the channel.
D D D S U U D D D
correlate
Scatterer
v
D Slot S Slot
The user sends SRS in special time slots, and the BS performs the channel measurement
and estimation to obtain the channel matrix. As shown in Figure 3, the channel matrix
measured by BS in the i-th SRS period is assumed to be Hi ∈ C Nr × Nt . Nt and Nr represent
the number of antennas of the BS and the mobile UE, respectively. The real and imaginary
parts are separately processed in the subsequent algorithm, and H represents a real matrix
in the later representation. The prediction order means that each channel matrix is related
to the channel matrix measured in the previous-order SRS periods, so the input vector of
the prediction system is
ui = [ Hi , Hi+1 , . . . , Hi+order−1 ]. (1)
There are some complex and unknown dependencies between ui and Hi : Hi+order−1 =
f (ui ) = f ([ Hi , Hi+1 , . . . , Hi+order−1 ]) that need to be exploited by kernel methods.
D D D S U U D D D
DD D S UU D D D S UU
measurement and estimation
ui
u i1
Figure 3. An illustration of the input and output pairs in a channel prediction module.
i 2
∑
2
J= y i − wi ϕ j = Φ i T wi − y i , (2)
j =1
mapping, and the exact mapping function is unknown. To avoid overfitting when the
samples are small, we used L2 regularization, so the cost function is reformulated as:
2
J = Φ i T wi − y i + λ k wi k2 . (3)
∂J
By letting ∂wi = 0, we can obtain that
By substituting (4) into (3), the cost function can be reformulated as:
2
J = Φi T Φi α (i ) − yi + λ k wi k2 = k Ki α ( i ) − y i k2 + λ k wi k2 , (5)
where Ki is the kernel matrix and the element located at the i-th row and j-th column of Ki is
Ki (i, j) = κ ui , u j . The problem is to solve
where k i = Φi−1 T ϕi . Thus, Q(i ) can be solved using the inversion of the partitioned matrix.
" #
T
Q ( i − 1 ) r ( i ) + z ( i ) z ( i ) − z ( i )
Q ( i ) = r ( i ) −1 , (8)
− z (i ) T 1
By relying on the kernel trick, the traditional KRLS algorithm can deal with nonlinear
problems by nonlinearly transforming data into a high-dimensional reproducing kernel
Hilbert space, which is similar to other techniques such as support vector machines (SVMs).
Compared to SVMs, it avoids large-scale high-dimensional computing through iterative
computations. However, the traditional KRLS algorithm grows linearly with the number
of processed data, rendering growing complexities for each consecutive update if no
additional measures are taken. In order to render online kernel algorithms feasible, growth
is typically slowed down by approximately representing the solution using only a subset
of bases that are considered to be relevant according to a chosen criterion. In our proposed
SSW-KRLS algorithm, we took similar operations to avoid an increase in computation,
which is discussed in detail in Section 4.
Sensors 2022, 22, 6248 7 of 16
The performance of the above methods in filtering samples largely depends on the
selected threshold. As time goes by, the number of samples increases slowly and lastly
stabilizes, while the update of model parameters tends to be slow. This is not conducive to
updating time-varying channels. Furthermore, when the user moves into a new environ-
ment, the outdated samples are reserved, which is not conducive to channel prediction.
An effective way to keep the dictionary updated while precisely controlling its size
is using the sliding window. A simple implementation is discarding the oldest sample
directly each time a new sample is collected, but it lacks the judgement of correlation
between samples. Sparseness simply based on time information is unreliable and may
result in unstable prediction performance when the window is small. In order to solve this
problem, we propose a new algorithm, SSW-KRLS, where we take both the time value and
correlation between samples into consideration.
Sensors 2022, 22, 6248 8 of 16
i 2
min ∑ βi− j y j − w T ϕ j + λB(i )w T w, (9)
w
j =1
where β is the forgetting factor, and i is the iteration number, yi = (y1 , . . . , yi ) T is the
L−1 , β L−2 , . . . , 1) is the forgetting matrix,
w is the weight vector, B(i ) = diag( β
output vector,
ϕ j = ϕ u j is the transformation of u j , and λ is the regularization parameter. The optimal
w∗ is solved: −1
w∗ = λB(i ) + Φi B(i )ΦiT Φ i B ( i ) yi . (10)
estimated CSI
H iorder output CSI prediction
Hˆ iorder (i )k (i )T
spaces. Suppose ui is a new sample, and u j is an existing sample in the dictionary. Let k(u j )
denote the kernel vector of u j ∈ D. k (u j ) = {k jn }n6=i,j , where k jn = κ u j , un . To measure
the correlation between the existing and new samples, we calculated the cosine value
k ( ui ) k ( u j ) T
of k (ui ) and k u j for ∀u j ∈ D, as cos k(ui ), k(u j ) = . When the size of the
|k(ui )||k(u j )|
dictionary did not reach the budget, we found a sample with highest correlation with the
new sample. Existing sample u j that had the highest cosine value was the most probable
to be discarded, where j = arg max cos k(ui ), k(u j ) . We set a threshold τ and discarded
j
sample u j if cos k (ui ), k (u j ) ≥ τ. The updated dictionary is:
D (i − 1)\u j , ui , i f max
j
cos k(ui ), k(u j ) ≥ τ
D (i )= (12)
[ D ( i − 1) , u i ],
i f max cos k (ui ), k(u j ) < τ
j
In another case, when the size of the dictionary reaches the fixed budget, one sample
must be discarded from the kernel dictionary in each time slot. In our strategy, we first
set a candidate discard set on the basis of the correlation with the new sample, and then
determined the optimal discarded sample according to its output value.
Candidate discard set S was composed of the samples that had high correlation with
the new sample. Specifically, an existing sample in dictionary u j ∈ D was added into S
if κ (ui , u j ) > ε, where ui is a new sample and ε is a threshold. Among the samples in
S, the one with the smallest value for prediction may be one that has the most similar
information with the new sample or the one that is untypical with little probability to
occur. We determined the discarded sample in combination with its corresponding output
as following.
Suppose that the candidate discard set is S = {u1 , u2 , . . . , un }. The weighted average
n
of the output values can be obtained as y = ∑ w j y j where
j =1
κ ( ui , u j )
wj = n (13)
∑ κ ( ui , u j )
j =1
With the combination of B(i ) and yi , we could obtain exponentially weighted input
signal ȳi . We can find that
βB(i − 1) 0
B (i ) = (16)
0T 1
Ki −1 h (i )
Ki = (17)
h (i ) T κ ( ui , ui )
βyi¯−1
ȳi = , (18)
ȳi
where
z ( i ) = Q ( i − 1) B ( i − 1) h ( i )
B
r (i ) = κ ( ui , ui ) + λ − h (i ) T z B (i ) (22)
z ( i ) = Q ( i − 1) T h ( i )
α ( i − 1 ) − z B ( i ) r ( i ) −1 e ( i )
α (i ) = , (23)
r ( i ) −1 e ( i )
and e(i ) = yi − h(i ) T α(i − 1) is the prediction error in the ith time slot.
5. Performance Evaluation
Based on the analysis above, we present algorithmic steps as Algorithm 2 and in this
section we show the simulation results of our proposed SSW-KRLS algorithmcompare
its performance to that of the ALD-KRLS algorithm as a baseline. The basic simulation
parameters are listed in Table 2. We adopted a 3D urban macro scenario, and considered
a typical setting of 3 kHz with 30 kHz of subcarrier spacing. We considered 20 MHz of
bandwidth that contained 51 resource partitions. Our adopted channel model was a CDL-A
channel model, and the DL precoder was RZF.
Figure 5 shows the normalized mean squared error (NMSE) performance of differ-
ent algorithms. We show the performance of the ALD-KRLS and SSW-KRLS algorithms
with the 60 and 120 km/h velocity levels for all UEs. When the UE speed was 60 km/h,
the prediction algorithms was more accurate than with a UE speed of 120 km/h. Re-
garding the performance of the SSW-KRLS algorithm, we set different sample budgets:
M = 30, 90, 150. The algorithm performed better with a higher sample budget. The SSW-
KRLS algorithm with sample budget M = 30 performed better than the ALD-KRLS algo-
rithm with v = 0.001, and the SSW-KRLS algorithm with sample budget M = 90 greatly
outperformed the ALD-KRLS algorithm. However, the performance was insignificantly
improved when changing the sample budget from 90 to 150.
0
SSW-KRLS M=30
SSW-KRLS M=90
-2
v=120km/h SSW-KRLS M=150
ALD-KRLS v=0.0005
-4 ALD-KRLS v=0.0001
-6
Nmse(dB)
-8
v=60km/h
-10
-12
-14
-16
0 100 200 300 400 500
Iteration
Figure 5. NMSE comparison of the proposed SSW-KRLS algorithm and the ALD-KRLS algorithm.
β = 0.97, Nt = 64, Nr = 4.
Figure 6 shows the kernel dictionary size with 400 iterations. The kernel dictionary
size for all algorithms first grew and then slowed down; the size for SSW-KRLS remained
unchanged. The SSW-KRLS algorithm with sample budget M = 90 outperformed the
ALD-KRLS algorithm with v = 0.001, while the former used fewer samples. This indicates
the superiority of our proposed SSW-KRLS algorithm.
Sensors 2022, 22, 6248 13 of 16
160
SSW-KRLS M=30
SSW-KRLS M=90
140 SSW-KRLS M=150
ALD-KRLS v=0.0005
120 ALD-KRLS v=0.0001
80
60
40
20
0
0 50 100 150 200 250 300 350 400
Iteration
Figure 6. The kernel dictionary size of the proposed SSW-KRLS algorithm and ALD-KRLS algorithm
varies with the number of iterations. β = 0.97, Nt = 64, Nr = 4.
Figures 7 and 8 show the mean rate of mobile users under different algorithms at the
speed of 60 and 120 km/h, respectively. Comparing the figure at two speed levels shows
that the user with the lower speed had a higher mean rate. In addition, the performance
could be enhanced greatly with the use of the channel prediction algorithm. Particularly,
our proposed SSW-KRLS algorithm with sample budgets M = 150 and M = 90 achieved
better performance than that of the ALD-KRLS algorithm with v = 0.0001; for our proposed
SSW-KRLS algorithm, a higher sample budget brought about better performance. Moreover,
users with more antennas showed better performance under any circumstances.
22
no prediction
20 SSW-KRLS, M=150
SSW-KRLS, M=90
18 ALD-KRLS, v=0.0001
16
Mean Rate (bps/Hz)
14
12
10
8
N=128
6
4 N=16
0
-5 0 5 10 15 20 25
SNR (dB)
Figure 7. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS
algorithm, and the proposed SSW-KRLS algorithm. β = 0.97, v = 60 km/h.
Sensors 2022, 22, 6248 14 of 16
18
no prediction
16 SSW-KRLS, M=150
SSW-KRLS, M=90
ALD-KRLS, v=0.0001
14
10
8
N=128
4
N=16
2
0
-5 0 5 10 15 20 25
SNR (dB)
Figure 8. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS
algorithm, and the proposed SSW-KRLS algorithm. β = 0.97, v = 120 km/h.
6. Conclusions
This paper proposed a new sparse sliding-window KRLS algorithm where a candidate
discard set is selected through correlation analysis between the mapping vectors in kernel
Hilbert spaces of the new input sample and the existing samples in the kernel dictionary.
It then determines the discarded sample in combination with its corresponding output
to achieve dynamic sample updates. Specifically, the proposed SSW-KRLS algorithm,
which maintains the size of the kernel dictionary within the sample budget, requires a
fixed amount of memory and computation per time step, incorporates regularization, and
achieves online predictions. Moreover, in order to sufficiently track strongly changeable
dynamic characteristics, a forgetting factor was considered in the proposed algorithm.
Numerical simulations demonstrated that, under a realistic channel model of 3GPP in a
rich scattering environment, our proposed algorithm achieved superior performance in
terms of both predictive accuracy and kernel dictionary size than that of the ALD-KRLS
algorithm. The NMSE for the channel prediction of the SSW-KRLS algorithm with M = 90
was about 2 dB lower than that of the ALD-KRLS algorithm with v = 0.001, while the
kernel dictionary was 17% smaller.
Author Contributions: Investigation, X.A. and J.Z.; methodology, X.A.; project administration, X.A.;
software, X.A.; resources, H.Z.; writing—original draft preparation, J.Z.; writing—review and editing,
H.Z.; funding acquisition, X.A. and Y.S. All authors have read and agreed to the published version of
the manuscript.
Funding: This research was funded by the ZTE Corporation Research Program.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix A
For a non-singular matrix A, if there is a new column and a new row added to the
matrix and get E as as followings:
A b
E= . (A1)
bT c
Sensors 2022, 22, 6248 15 of 16
Appendix B
For a non-singular matrix E, if the first column and the first row are removed from the
matrix and get C:
a bT
E= . (A5)
b C
Suppose the inversion of E is formulated as:
eT
−1 d
E = . (A6)
e F
References
1. Li, M.; Collings, I.B.; Hanly, S.V.; Liu, C.; Whiting, P. Multicell Coordinated Scheduling with Multiuser Zero-Forcing Beamforming.
IEEE Trans. Wirel. Commun. 2016, 15, 827–842. [CrossRef]
2. Yin, H.; Wang, H.; Liu, Y.; Gesbert, D. Addressing the Curse of Mobility in Massive MIMO with Prony-Based Angular-Delay
Domain Channel Predictions. IEEE J. Sel. Areas Commun. 2020, 38, 2903–2917. [CrossRef]
3. Li, X.; Jin, S.; Suraweera, H.A.; Hou, J.; Gao, X. Statistical 3-D Beamforming for Large-Scale MIMO Downlink Systems over Rician
Fading Channels. IEEE Trans. Commun. 2016, 64, 1529–1543. [CrossRef]
4. Sapavath, N.N.; Rawat, D.B.; Song, M. Machine Learning for RF Slicing Using CSI Prediction in Software Defined Large-Scale
MIMO Wireless Networks. IEEE Trans. Netw. Sci. Eng. 2020, 7, 2137–2144. [CrossRef]
5. Huang, C.; Liu, L.; Yuen, C.; Sun, S. Iterative Channel Estimation Using LSE and Sparse Message Passing for MmWave MIMO
Systems. IEEE Trans. Signal Process. 2019, 67, 245–259. [CrossRef]
6. Bellili, F.; Sohrabi, F.; Yu, W. Generalized Approximate Message Passing for Massive MIMO mmWave Channel Estimation With
Laplacian Prior. IEEE Trans. Commun. 2019, 67, 3205–3219. [CrossRef]
7. Wu, D.; Zhang, H. Tractable Modelling and Robust Coordinated Beamforming Design with Partially Accurate CSI. IEEE Wirel.
Commun. Lett. 2021, 10, 2384–2387. [CrossRef]
8. Lu, A.; Gao, X.; Xiao, C. Free Deterministic Equivalents for the Analysis of MIMO Multiple Access Channel. IEEE Trans. Inf.
Theory 2016, 62, 4604–4629. [CrossRef]
9. Wu, C.; Yi, X.; Zhu, Y.; Wang, W.; You, L.; Gao, X. Channel Prediction in High-Mobility Massive MIMO: From Spatio-Temporal
Autoregression to Deep Learning. IEEE J. Sel. Areas Commun. 2021, 39, 1915–1930. [CrossRef]
Sensors 2022, 22, 6248 16 of 16
10. Chen, M.; Viberg, M. Long-Range Channel Prediction Based on Nonstationary Parametric Modeling. IEEE Trans. Signal Process.
2009, 57, 622–634. [CrossRef]
11. Liu, L.; Feng, H.; Yang, T.; Hu, B. MIMO-OFDM Wireless Channel Prediction by Exploiting Spatial-Temporal Correlation. IEEE
Trans. Wirel. Commun. 2014, 13, 310–319. [CrossRef]
12. Lv, C.; Lin, J.-C.; Yang, Z. Channel Prediction for Millimeter Wave MIMO-OFDM Communications in Rapidly Time-Varying
Frequency-Selective Fading Channels. IEEE Access 2019, 7, 15183–15195. [CrossRef]
13. Yuan, J.; Ngo, H.Q.; Matthaiou, M. Machine Learning-Based Channel Prediction in Massive MIMO with Channel Aging. IEEE
Trans. Wirel. Commun. 2020, 19, 2960–2973. [CrossRef]
14. Zhao, J.; Tian, H.; Li, D. Channel Prediction Based on BP Neural Network for Backscatter Communication Networks. Sensors
2020, 20, 300. [CrossRef]
15. Ahrens, J.; Ahrens, L.; Schotten, H.D. A Machine Learning Method for Prediction of Multipath Channels. ZTE Commun. 2019, 17,
12–18.
16. Sanchez-Fernandez, M.; de-Prado-Cumplido, M.; Arenas-Garcia, J.; Perez-Cruz, F. SVM multiregression for nonlinear channel
estimation in multiple-input multiple-output systems. IEEE Trans. Signal Process. 2004, 52, 2298–2307. [CrossRef]
17. Engel, Y.; Mannor, S.; Meir, R. The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 2004, 52, 2275–2285.
[CrossRef]
18. Liu, W.; Park, I.; Wang, Y.; Principe, J.C. Extended Kernel Recursive Least Squares Algorithm. IEEE Trans. Signal Process. 2009, 57,
3801–3814.
19. Guo, J.; Chen, H.; Chen, S. Improved Kernel Recursive Least Squares Algorithm Based Online Prediction for Nonstationary Time
Series. IEEE Signal Process. Lett. 2020, 27, 1365–1369. [CrossRef]
20. Platt, J. A Resource-Allocating Network for Function Interpolation. Neural Comput. 1991, 3, 213–225. [CrossRef]
21. Liu, W.; Park, I.; Principe, J.C. An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters. IEEE Trans.
Neural Netw. 2009, 20, 1950–1961. [CrossRef] [PubMed]
22. Richard, C.; Bermudez, J.C.M.; Honeine, P. Online Prediction of Time Series Data with Kernels. IEEE Trans. Signal Process. 2009,
57, 1058–1067. [CrossRef]
23. Yang, M.; Ai, B.; He, R.; Huang, C.; Ma, Z.; Zhong, Z.; Wang, J.; Pei, L.; Li, Y.; Li, J. Machine-Learning-Based Fast Angle-of-Arrival
Recognition for Vehicular Communications. IEEE Trans. Veh. Technol. 2021, 70, 1592–1605. [CrossRef]
24. Cherkassky, V.; Mulier, F.M. Statistical Learning Theory. In Learning from Data: Concepts, Theory, and Methods, 1st ed.; John Wiley &
Sons: Hoboken, NJ, USA, 2007.
25. Vaerenbergh, S.V.; Lazaro-Gredilla, M.; Santamaria, I. Kernel Recursive Least-Squares Tracker for Time-Varying Regression. IEEE
Trans. Neural Netw. Learn. Syst. 2012, 23, 1313–1326. [CrossRef]
26. Xiong, K.; Wang, S. The Online Random Fourier Features Conjugate Gradient Algorithm. IEEE Signal Process. Lett. 2019, 26,
740–744. [CrossRef]