0% found this document useful (0 votes)
34 views44 pages

Thesis 04

The document discusses various methods for simulating multipath channels and generating white Gaussian noise (WGN) for use in channel simulators. It describes three main approaches to generating fading taps with the desired Doppler spectrum: the sum of sinusoids method, multiplying Gaussian variables with a frequency mask, and filtering WGN with a Doppler shaping filter. It also examines two popular methods for generating WGN in hardware: the Box-Muller method which transforms uniform random variables to Gaussian, and using the central limit theorem by summing random variables. The complete block diagram of a tap generator is shown, consisting of WGN generation, Doppler filtering, interpolation, and spectrum shifting blocks.

Uploaded by

woodksd
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views44 pages

Thesis 04

The document discusses various methods for simulating multipath channels and generating white Gaussian noise (WGN) for use in channel simulators. It describes three main approaches to generating fading taps with the desired Doppler spectrum: the sum of sinusoids method, multiplying Gaussian variables with a frequency mask, and filtering WGN with a Doppler shaping filter. It also examines two popular methods for generating WGN in hardware: the Box-Muller method which transforms uniform random variables to Gaussian, and using the central limit theorem by summing random variables. The complete block diagram of a tap generator is shown, consisting of WGN generation, Doppler filtering, interpolation, and spectrum shifting blocks.

Uploaded by

woodksd
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Chapter 4.

Multipath Channel Simulators


Contents
4.1. Fading Taps Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2. White Gaussian Noise Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1. The Box-Muller Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.2. The Use of the Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.3. The LFSR as Random Number Generator . . . . . . . . . . . . . . . . . . . . . . 69
4.2.4. Generic Architecture for WGN Generation . . . . . . . . . . . . . . . . . . . . . . 74
4.3. Spectrum-Shaping Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.1. Filter Design Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.2. A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.3. Implementation Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4. Spectrum Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5. Polyphase Interpolator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5.1. Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.5.2. Polyphase Coecients Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5.3. Interpolation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5.4. Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Channel simulators are essential components for controlled and repeatable test-benches of
communication systems, allowing researchers to work independently and compare results. Tra-
ditionally, channel simulators have been used for software simulations of receiver/transmitter
chains. Nowadays, as systems become increasingly complex, hardware acceleration using rapid
prototyping solutions become more and more appealing. This creates the need for hardware
channel simulators, as part of hardware test benches on FPGA for example.
63
64 Chapter 4 Multipath Channel Simulators
In simulations, channel models are most often used with discrete complex base-band signals. For
simplication, the WSSUS assumption introduced in Section 2.1 is used, where the fading
processes are considered independent stochastic processes. Regardless of type, any channel
prole must dene the number of taps, their time delay relative to the rst tap, and their
average gain relative to the strongest tap (in dB). The tap gains can also be specied as the
rms of their absolute values. For Rice channels, the Ricean factor K corresponding to the
line-of-sight component need also be specied.
Regarding the Doppler fading spectrum of the taps, there are dierent characterizations, de-
pending on the type of channel. Usually, land-mobile Rayleigh channel models assume a Jakes
spectrum for all taps, with the same maximum Doppler shift f
Dmax
. For ionospheric HF chan-
nels, with a Gaussian Doppler spectrum, it is more common to specify the Doppler shift D
sh
and Doppler spread D
sp
separately for each tap, as seen in Table 2.3.
4.1. Fading Taps Generation
The most dicult task in a channel simulator is the generation of the fading taps. The taps are
complex Gaussian processes having the required Doppler spectrum. So far, three approaches
are known in the literature [72].
The rst approach, called sum of sinusoids, was proposed by Jakes [43] and consists in adding
a number of sinusoids of equal amplitude and randomly distributed phases in order to generate
a fading tap. The approach has been rened and extended to generate uncorrelated taps for
multi-path channels [15]. Recent improvements include additional randomization, as in [75]
and [102]. This method is computationally demanding due to the large number of calls to sin.
An advantage, however, is the possibility of varying the Doppler spectrum continuously.
The second approach was introduced in [101] and consists in multiplying a set of independent
complex Gaussian variables with a frequency mask equal with the square root of the desired
power spectrum. The sequence is zero-padded and its inverse FFT is taken. The resulting
sequence has the desired spectrum and is still Gaussian since the FFT operation is linear. The
solution is computationally ecient but has the disadvantage that it is block-oriented in nature.
The third approach, which is also used in this thesis, consists in ltering a white Gaussian
noise (WGN) by a lter with a frequency response equal with the square root of the Doppler
spectrum. The WGN can be easily generate with a pseudo-random source. Since the WGN has
a at spectrum, the spectrum of the ltered signal will be exactly the transfer characteristic of
the Doppler shaping lter.
We need to note that for a discrete channel of sampling period T and Doppler frequency f
D
the
spectrum is limited to a discrete frequency of f
D
T. For most applications this frequency is very
small. For instance, for a system in the 900 MHz band and a vehicle speed of 100 km/h, with a
sampling rate of 1 Msps, the discrete Doppler rate is only 0.000083. The spectrum shaping lter
4.2 White Gaussian Noise Generators 65
will have to be extremely narrow band, with a sharp cut-o and innite stop-band attenuation.
These requirements cannot be satised with a FIR lter of practical length. Even an IIR lter,
although it reduces the number of necessary taps, would be too hard to realize. The solution
is to combine an IIR lter with a polyphase interpolator. The polyphase interpolation avoids
all unnecessary multiplications with zero and can handle very large interpolation factors.
The complete block schematic of a complex fading tap generator is shown in Figure 4.1. The
two WGN generators are independent, the Doppler lters are identical, and the interpolator
is shared, that is, only its control logic. Between Doppler spectrum shaping and interpolation
there is an additional block for introducing a Doppler shift in the spectrum. In the following,
we will deal with these blocks in more detail.
WGN
generator
Doppler
filter
WGN
generator
Doppler
filter
Re
Im
Interpolator
Spectrum
shifter
Figure 4.1.: Block schematic of a complex fading tap generator
4.2. White Gaussian Noise Generators
So far there has been little previous work in the area of digital HW generation of white Gaussian
noise. Three representative publications, [25] [4] [52], address this issue and propose FPGA
implementations.
The most popular methods for generating pseudo-random Gaussian noise in HW are:
1. The Box-Muller method [5]
2. The use of the central limit theorem [50]
4.2.1. The Box-Muller Method
This method produces two independent Gaussian variables by transforming two independent
variables u
1
and u
2
with uniform distribution over the interval [0, 1). First, a set of intermediate
66 Chapter 4 Multipath Channel Simulators
functions are computed:
f(u
1
) =
_
ln(u
1
) (4.1)
g
1
(u
2
) =

2 sin(2u
2
) (4.2)
g
2
(u
2
) =

2 cos(2u
2
) (4.3)
The products x
1
and x
2
will have Gaussian distributions with zero mean and = 1.
x
1
= f(u
1
)g(u
2
) (4.4)
x
2
= f(u
2
)g(u
2
) (4.5)
The HW implementation of the these equations is straightforward, as shown in Figure 4.2. The
challenge is how to implement the square root of the logarithm and the sine/cosine eciently.
An ecient solution is to employ hybrid look-up tables with non-uniform segmentation, as
proposed in [25] and [52]. Nevertheless, the resulting circuit size is still large and it is not easy
to make the design generic and scalable.
f(u1)
g2(u2)
g1(u2)
uniform
gen.
uniform
gen.
x1
x1
u2
u1
Figure 4.2.: Gaussian random generator using the Box-Muller method
4.2.2. The Use of the Central Limit Theorem
The central limit theorem oers a very simple way to generate random Gaussian noise. Accord-
ing to the central limit theorem, the distribution of a sum of independent random variables with
arbitrary distributions converges to a normal distribution as the number of variables increases.
If N independent variables have equal distributions with mean and standard deviation , the
distribution of their sum will have
N
= N and
N
=

N. Intuitively, the probability den-


sity function (pdf) of the sum of two independent variables can be obtained as the convolution
of their pdfs.
In order to gain an estimate of the complexity, we proceed to determine how many uniformly
distributed variables need to be summed to achieve a Gaussian distribution with a given accu-
4.2 White Gaussian Noise Generators 67
racy. The accuracy is given by the relative error (x) of the resulting distribution X(x) against
an ideal Gaussian distribution N(x) with the same .
(x) =
X(x) N(x)
N(x)
(4.6)
We also determine the absolute error between X(x) and N(x), considering N(0) as normalized
reference level. The results are shown in Figure 4.3.
0 0.5 1 1.5 2 2.5 3 3.5 4
0.06
0.05
0.04
0.03
0.02
0.01
0
0.01
0.02
Sample values (x )
R
e
l
a
t
i
v
e

e
r
r
o
r


sum of 16
sum of 32
sum of 64
sum of 128
sum of 256
sum of 512
(a) Relative error
0 0.5 1 1.5 2 2.5 3 3.5 4
10
8
6
4
2
0
2
4
6
x 10
3
Sample values (x )
A
b
s
o
l
u
t
e

e
r
r
o
r


sum of 16
sum of 32
sum of 64
sum of 128
sum of 256
sum of 512
(b) Absolute error
Figure 4.3.: Error of a Gaussian distribution obtained from uniform distributions
These results show that in order to achieve a good accuracy, especially at high sample values,
we need to sum a very large number of uniformly distributed random variables. For example, in
order to get 1% accuracy at 4 more than 512 random variables have to be summed. At a rst
glance, this might seem a very tough design problem. However, by nding ecient solutions
for generating uniform distributions, the complexity can be signicantly reduced.
For digital generators, the most straightforward solution is to sum N discrete variables with
uniform distribution. In the case of a uniform distribution, the variance
2
has the following
expression, where M is the number of discrete values in the distribution:

2
u
=
M
2
1
12
(4.7)
If the uniform variable has a width of B bits, the mean
u
and the standard deviation
u
have
68 Chapter 4 Multipath Channel Simulators
Bits 1 2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
2
10
Width 1 2 3 4 5 6 7 8 9 10 11
Range 0 . . . 1 0 . . . 2
1
0 . . . 2
2
0 . . . 2
3
0 . . . 2
4
0 . . . 2
5
0 . . . 2
6
0 . . . 2
7
0 . . . 2
8
0 . . . 2
9
0 . . . 2
10
Mean 1/2 2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
Stdev 1/2 1/

2 2
0
2
0

2 2
1
2
1

2 2
2
2
2

2 2
3
2
3

2 2
4
Table 4.1.: Properties of the sum of 2
N
random binary variables
the following expressions:

u
=
2
B
1
2
(4.8)

u
=
_
2
2B
1
12
(4.9)
The challenge is how to design a multi-bit uniform generator and how to sum so many values,
e.g. 256 for a decent accuracy. The complexity can minimized if we sum 1-bit variables. The
number of required adders remains the same, but their size will be much smaller. The problem
consists now in how to generate so many independent binary variables in parallel.
Summing 1-bit uniform variables has also the advantage that both the mean and the standard
deviation have simple expressions, which simplies the hardware implementation. We hereafter
denote by 2
N
the number of variables to be summed, always a power of two. Table 4.1 shows
various properties of the resulting sum: bitwidth, range, mean , and standard deviation .
These results have been obtained by treating the binary variables as unsigned numbers and
summing them accordingly. This results in a non-zero mean. When summing binary variables
the mean is always a power of two, which is convenient for hardware implementations. In most
practical applications, however, a zero-mean Gaussian distribution is desired. This can be
achieved either by subtracting the known mean from the sum or by simply summing the binary
variables as signed numbers. The latter requires no hardware and is obviously the preferred
method. Figure 4.4 shows the resulting distribution (histogram) when summing 2
6
binary
variables, both as signed and as unsigned. The horizontal axis shows the full range of a 4-bit
number. The mean and the limits of the resulting sum are also shown on the horizontal axis.
Sigma is 2
2
in both cases.
From both Table 4.1 and Figure 4.4 it can be observed that the range of the sum is nearly
half of the full range. Another useful observation that can be derived from Table 4.1 is that the
relative sigma, i.e. relative to the full range, decreases by

2 for every doubling of the number


of binary variables. Figure 4.5 shows the histogram when summing 2
6
and 2
8
variables. The
horizontal axis is zoomed to half of the sum range for clarity. When increasing the number of
variables 4 times, the samples range also increases 4 times, whereas absolute increases only
2 times, which is equivalent to a decrease of relative by 2.
4.2 White Gaussian Noise Generators 69
0 32 64 127
0
0.02
0.04
0.06
0.08
0.1
Sample values
P
r
o
b
a
b
i
l
i
t
y
Sum as unsigned
64 32 0 32 63
0
0.02
0.04
0.06
0.08
0.1
Sample values
P
r
o
b
a
b
i
l
i
t
y
Sum as signed
Figure 4.4.: Histogram of the sum of 64 (2
6
) independent binary variables
The more variables are summed, the narrower the range of interest becomes relatively to the
full range. The range of interest is specied as a multiple of sigma, usually a power of 2. Typical
cases are 4 . . . 4 and 8 . . . 8, depending on the application. This allows for reducing the
number of bits of the generated Gaussian noise to accommodate just the range of interest. The
operation consists simply in discarding a number of MSBs. For example, when summing 2
8
binary variables the full range is 32 . . . 32. Reducing the range to 8 . . . 8 is performed
by discarding two MSBs. These insights will be used later when designing the HW generator.
4.2.3. The LFSR as Random Number Generator
There are two types of LFSR congurations. The Galois conguration has XOR gates (modulo
2 adders) in the shift register path (Figure 4.6a) and is also referred to as in-line, internal, or
modular type. The Fibonacci conguration has XOR gates in the feedback path (Figure 4.6b)
and is also named out-of-line, external, or simple type. The Galois implementation is better
suited for HW implementation because the critical path always contains only one XOR gate,
regardless of the generator polynomial.
The binary polynomial P(x) is referred to as generator polynomial. When this polynomial
is primitive, the generated pseudo-random binary sequence has maximum length 2
L
1. A
primitive polynomial is a polynomial that generates all elements of an extension eld from a
base eld, in our case GF(2). All primitive polynomials are also irreducible, i.e. they cannot
be factored into nontrivial polynomials over the same eld, but not all irreducible polynomials
are primitive. Table A.1 lists primitive polynomials for LFSR length L between 2 and 64.
A LFSR can be used to generate multiple binary sequences in parallel, by generating more than
70 Chapter 4 Multipath Channel Simulators
32 0 32
0
0.02
0.04
0.06
0.08
0.1
Sample values
P
r
o
b
a
b
i
l
i
t
y
Sum of 2
6
variables
128 0 128
0
0.01
0.02
0.03
0.04
0.05
Sample values
P
r
o
b
a
b
i
l
i
t
y
Sum of 2
8
variables
Figure 4.5.: Decrease of relative sigma with the number of binary variables
0
x
2
x
0
1 2 5 6 7
x
8
3 4
x
3
x
4
out
(a) Galois conguration
0
x
8
1 2 5 6 7 3 4
x
0
x
2
x
3
x
4
out
(b) Fibonacci conguration
Figure 4.6.: LFSR conguration types, example P(x) = x
8
+x
4
+x
3
+x
2
+ 1
one bit of the pseudo-random sequence every clock cycle, say M. If the number of outputs M
is a power of two, the period is preserved. The M streams will be identical, of period 2
N
1,
and any two adjacent outputs will have a phase shift of 2
N
/M. This is shown in Figure 4.7
for a LFSR of length 5 which generates 4 consecutive bits in parallel.
In order to determine how to generate more bits of the sequence in parallel, we redraw the
schematic of the sequential Galois LFSR so that we emphasize the initial and nal states of the
registers (Figure 4.8a). It can be now easily seen that the LFSR consists of a state register
and a combinational block which computes the next state. In order to generate M bits in
parallel M such combinational blocks have to be cascaded, as in Figure 4.8b.
The M parallel binary sequences are independent and can be added as a rst step in obtaining a
Gaussian distribution. Due to the cross-correlation between the sequences, the auto-correlation
of the sum will have a peak at lag 2
N
/M, thus reducing the eective period of the sum sequence.
4.2 White Gaussian Noise Generators 71
Output 0: b
0
Output 1:
Output 2:
Output 3:
b
1
b
2
b
3
b
4
b
5
b
6
b
7
b
8
b
9
b
10
b
11
b
12
b
13
b
14
b
15
b
16
b
17
b
18
b
19
b
20
b
21
b
22
b
23
b
24
b
25
b
26
b
27
b
28
b
29
b
30
b
0
b
1
b
2
b
3
b
4
b
5
b
6
b
7
b
9
b
10
b
11
b
13
b
14
b
15
b
17
b
18
b
19
b
21
b
22
b
23
b
25
b
26
b
27
b
29
b
30
b
0
b
8
b
12
b
16
b
20
b
24
b
28
b
1
b
2
b
3
b
6
b
7
b
10
b
11
b
14
b
15
b
18
b
19
b
22
b
23
b
26
b
27
b
30
b
0
b
4
b
5
b
8
b
9
b
12
b
13
b
16
b
17
b
20
b
21
b
24
b
25
b
28
b
29
b
1
b
2
b
5
b
6
b
9
b
10
b
13
b
14
b
17
b
18
b
21
b
22
b
25
b
26
b
29
b
30
b
3
b
7
b
11
b
15
b
19
b
23
b
27
b
4
b
8
b
12
b
16
b
20
b
24
b
28
Figure 4.7.: Generation of multiple bits in parallel with LFSR
0 1 2 3 4 5 6 7
b0
(a) Sequential Galois LFSR (redrawn)
0 1 2 3 4 5 6 7
b0
b1
b2
b3
(b) Parallel LFSR (4 bits)
Figure 4.8.: Generating more pseudo-random bits in parallel
The number of parallel outputs that can be generated with a single LFSR is limited since it
directly aects the maximum frequency. Their number should not exceed 32.
In the following we want to investigate the statistical properties of the Gaussian noise obtained
by summing M consecutive bits generates by the same LFSR of length L. As concrete case
we consider M = 64. The resulting Gaussian samples range from 32 to 32, with a = 4.
Figure 4.9 shows the discrete distribution and its absolute error compared to an ideal Gaussian,
for the noise obtained by summing 64 independent binary variables. The absolute error is
consistent with the prole in Figure 4.3b.
Contrary to our expectations, when summing binary sequences generated by the same LFSR,
the resulting distribution will not be exactly the expected one. The reason is the existing
correlation between the generated binary sequences, which originate from the same generator
polynomial. The resulting distributions have a very small ( 1) mean and a larger absolute
error compared to the ideal case in Figure 4.9b. The error depends mainly on the number of
taps of the LFSR and very less on their position or the LFSR length.
The absolute errors for a 3-tap and a 1-tap LFSR are shown in Figure 4.10. The generator
72 Chapter 4 Multipath Channel Simulators
16 12 8 4 0 4 8 12 16
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Sample values [4,4]
P
r
o
b
a
b
i
l
i
t
y
(a) Probability distribution
16 12 8 4 0 4 8 12 16
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
x 10
3
Sample values [4,4]
A
b
s
o
l
u
t
e

p
r
o
b
a
b
i
l
i
t
y

e
r
r
o
r
(b) Absolute error
Figure 4.9.: Probability distribution and its absolute error for the noise obtained by summing
64 independent binary variables
polynomials are taken from Table A.1. Only 1-tap and 3-tap irreducible polynomials are
considered because for all lengths up to 256 there exists at least one such polynomial. The only
notable exception occurs at length 37, for which no 1-tap or 3-tap irreducible polynomial exists,
the rst one having 5 taps. Although we only give the results for two LFSRs, simulations for
LFSR lengths up to 64 show that all LFSR with the same number of taps produce almost the
same error prole, with a deviation of less than 20%. The error proles have been obtained by
analyzing 2
3
0 (approx. 1 billion) generated Gaussian samples.
16 12 8 4 0 4 8 12 16
0.01
0.008
0.006
0.004
0.002
0
0.002
0.004
0.006
0.008
0.01
Sample values [4,4]
A
b
s
o
l
u
t
e

p
r
o
b
a
b
i
l
i
t
y

e
r
r
o
r
(a) LFSR 32, 3 taps
16 12 8 4 0 4 8 12 16
0.01
0.008
0.006
0.004
0.002
0
0.002
0.004
0.006
0.008
0.01
Sample values [4,4]
A
b
s
o
l
u
t
e

p
r
o
b
a
b
i
l
i
t
y

e
r
r
o
r
(b) LFSR 33, 1 tap
Figure 4.10.: Probability distribution errors for noise obtained by summing 64 parallel outputs
of an LFSR
For the 1-tap LFSR the absolute error peak is around 0.01, whereas for the 3-tap LFSR the
error is much smaller, with an error peak around 0.001, still larger than the ideal case in
4.2 White Gaussian Noise Generators 73
Figure 4.9b. The error of the resulting distribution can be reduced by introducing additional
randomness, which reduces the correlation. Our proposal consists in XOR-ing the generated
bits by another pseudo-random sequence generated by an additional LFSR. The number of
generated bits XOR-ed with the same bit of the extra sequence is a parameter of the solution,
and should be a power-of-2. We call this parameter randomization step. Figure 4.11 shows
the congurations for three dierent randomization steps: 2
0
, 2
1
, 2
2
.
M
a
i
n
L
F
S
R
b0
b1
b2
b3
b4
b5
b6
b7
E
x
t
r
a
L
F
S
R
b0
b1
b2
b3
b4
b5
b6
b7
(a) Step = 2
0
M
a
i
n
L
F
S
R
b0
b1
b2
b3
b4
b5
b6
b7
E
x
t
r
a
L
F
S
R
b0
b1
b2
b3
(b) Step = 2
1
M
a
i
n
L
F
S
R
E
x
t
r
a
L
F
S
R
b0
b1
b2
b3
b4
b5
b6
b7
b0
b1
(c) Step = 2
2
Figure 4.11.: Schematics for randomization with dierent steps
Simulations show that by applying randomization as proposed above, even with large steps,
results in a distribution with zero mean. As expected, the smaller the randomization step, the
lower the error. Figure 4.12 shows the error of the resulting histogram in the case of 3-tap
LFSRs, when using randomization with steps 2
4
, 2
2
, and 2
0
respectively. The reference is the
histogram of the sum of 2
6
independent binary variables, shown in Figure 4.9b.
16 12 8 4 0 4 8 12 16
1.5
1
0.5
0
0.5
1
1.5
x 10
4
Sample values [4,4]
E
r
r
o
r
(a) Step = 2
4
16 12 8 4 0 4 8 12 16
1.5
1
0.5
0
0.5
1
1.5
x 10
4
Sample values [4,4]
E
r
r
o
r
(b) Step = 2
2
16 12 8 4 0 4 8 12 16
1.5
1
0.5
0
0.5
1
1.5
x 10
4
Sample values [4,4]
E
r
r
o
r
(c) Step = 2
0
Figure 4.12.: Error reduction when decreasing the randomization step
The conclusion is that when using extra randomization with step 1, i.e. each generated bit is
XOR-ed with a bit of the extra sequence, the error is reduced to zero. The residual error in
74 Chapter 4 Multipath Channel Simulators
Figure 4.12c is actually caused by the limited number of samples (2
32
) used in the simulation.
Even for a randomization step of 2
4
, the error is still signicantly below the error compared
to the ideal Gaussian distribution. Moreover, simulations have shown that the relative error
reduction depends on the randomization step alone, regardless of the error with no random-
ization. For full randomization, i.e. with step 1, any existing correlation between the parallel
bitstreams that are summed is destroyed, therefore it does not matter wether the generator
polynomial has 1 or 3 taps.
4.2.4. Generic Architecture for WGN Generation
A good Gaussian distribution requires the summation of a large number of binary variables,
e.g. 256. Generating so many bits in parallel with a single LFSR is not practically feasible. The
solution is to employ more than one LFSR. For 8 LFSRs with 32 outputs each we have already
256 binary variables to sum. It is essential for the LFSRs to be of dierent lengths in order to
maximize the period of the resulting sequence. If the periods of the individual sequences are
relatively prime, the period of the resulting sequence is the product of the individual periods.
Figure 4.13 shows an example conguration that illustrates the use of multiple LFSRs. This
example employs 4 LFSRs with 8 outputs each, much less than required by a practical design.
The bitwidth of the result depends on the number of binary variables, as indicated in Table 4.1.
The result becomes one bit wider with each addition. If we sum 2
N
binary sequences the
bitwidth of result will be N + 1. The adder for 2
N
binary variables is implemented using a
symmetrical tree structure, as shown in Figure 4.13. Each stage, their width increases by one
starting from 1 bit and their number is reduced by two starting from 2
N1
. If the adders are
implemented as ripple carry, as it is usually the case in FPGA, the number of elementary 1-bit
adders is given by the equation below. For 2
8
1-bit inputs, we need for example 247 elementary
adders.
LFSR generator 0
b0 b1 b2 b3 b4 b5 b6 b7
1 1 1 1 1 1 1 1
2 2 2 2
3 3
LFSR generator 1
b0 b1 b2 b3 b4 b5 b6 b7
1 1 1 1 1 1 1 1
2 2 2 2
3 3
LFSR generator 2
b0 b1 b2 b3 b4 b5 b6 b7
1 1 1 1 1 1 1 1
2 2 2 2
3 3
LFSR generator 3
b0 b1 b2 b3 b4 b5 b6 b7
1 1 1 1 1 1 1 1
2 2 2 2
3 3
4 4 4
4
5 5
Dout
6
Figure 4.13.: Gaussian noise generation by summing pseudo-random binary sequences
The main properties of an adder tree for 2
N
binary variables are summarized below as a function
4.2 White Gaussian Noise Generators 75
# bits 2
4
2
5
2
6
2
7
2
8
# elementary adders (total) 22 52 114 240 494
# elementary adders (critical path) 6 10 15 21 28
Table 4.2.: Some properties of an adder tree for 2
N
bits
of N. We have assumed that an N-bit adder consists of N 1-bit elementary adders.
Number of adders: 2
N
1
Number of adders in the critical path: N
Number of elementary adders:

N1
k=1
k 2
Nk
Number of elementary adders in the critical path:

N1
k=1
k = n(n 1)/2
Table 4.2 shows the number of elementary adders (total and in the critical path) for N ranging
from 4 to 8. For N = 8 the depth reaches 28, which limits the operating frequency drastically.
Fortunately, the adder tree can be pipelined since there are no loops involved.
The regular structure of the adder tree makes the realization of a scalable design possible.
As a proof of concept we have created a completely generic adder tree design in VHDL. The
parameters are the number and the width of the input operands, and wether they are treated as
signed or unsigned. The number of inputs is not constrained to a power of two. Moreover, the
pipelining is congurable through an additional generic parameter that indicates the number of
combinational adder stages between two registers, starting from the output. If this parameter
is zero, the resulting adder tree is purely combinational. The circuit relies on the recursive
instantiation feature in VHDL, which is currently supported by most of the synthesis tools.
In some applications, Gaussian noise samples need not be generated every clock cycle. An
example is the white noise generator for the fading taps, where the sample rate before the
interpolator is very low because of the very low Doppler spreads. This situation can be exploited
to reduce the number of adders for the same number of bits to be summed or to sum more bits
with the same adder. The solution is to use an accumulator at the output to sum consecutive
samples. If we need to generate a sample every 4 clock cycles, the number of bits to be
generated and summed in a clock cycle is reduced by 4. Now instead of 2
8
bits we will only
need to generate 2
6
in parallel. According to Table 4.2, the total number of elementary adders
decreases from 494 to only 114 while the number of adders in the critical path is reduced from
28 to 15, which saves area and increases the maximum frequency.
Figure 4.15 shows the distribution of the noise generated with the architecture presented
above, that is, by summing 256 binary variables. In this gure, the probability is normalized
to the probability of zero and the range is limited between 4 and 4. The distribution is
perfectly smooth, unlike the one obtained in [25] and [52] using the Box-Muller method.
76 Chapter 4 Multipath Channel Simulators
LFSR gen. 0
b0
b1
b15
en
LFSR gen. 1
b0
b1
b15
en
LFSR gen. 2
b0
b1
b15
en
LFSR gen. 3
b0
b1
b15
en
0
Timer
active first last
start
D D D
D D D
D D D
Adder tree (64 inputs)
Adder pipe delay
compensation
D
0
1
0
1
0
Accumulator
7
11
start
en init
dout
D valid
Extra LFSR
b0
b1
b15
en
3 MSB
8 LSB
Figure 4.14.: Schematic of a Gaussian noise generator with phase accumulator
FPGA
Resource
Sequential factor
Total in
device
4 8 16
# Slices 283 189 114 960
# Slice FFs 448 311 244 1920
# 4-input LUTs 499 312 238 1920
Table 4.3.: Synthesis results for a Xilinx Spartan3E XC3S100E-5 FPGA
The proposed architecture is also completely scalable, unlike those proposed in [25] and [52],
allowing to generate Gaussian noise by summing any power-of-2 number of binary variables.
Moreover, it also allows to trade o throughput for precision, which is very desirable for fading
tap generators, where data rates are very low before interpolation. Table 4.3 shows the
synthesis results for a Gaussian generator that sums 256 binary variables. This implementation
uses 4 LFSRs with lengths 32, 33, 34, and 35. Synthesis results are shown for sequential factors
of 4, 8, and 16. The sequential factor indicates the number of clock cycles needed to generate
a noise sample. Lower sequential factors denote higher parallelism, which explains why they
require more FPGA resources. The reported speed after synthesis only, for a grade-5 device, is
about 190 MHz in all three cases.
Compared to the results in [52] our architecture requires far fewer resources, in addition to its
excellent scalability. For sake of comparison, we have also synthesized a parallel conguration
4.3 Spectrum-Shaping Filter 77
64 48 32 16 0 16 32 48 64
0
0.2
0.4
0.6
0.8
1
Sample values [4,4]
N
o
r
m
a
l
i
z
e
d

p
r
o
b
a
b
i
l
i
t
y
64 48 32 16 0 16 32 48 64
2
1
0
1
2
x 10
4
Sample values [4,4]
A
b
s
o
l
u
t
e

p
r
o
b
a
b
i
l
i
t
y

e
r
r
o
r
Figure 4.15.: Normalized probability and the absolute error when summing 256 binary variables
with roughly the same performance. This conguration generates a sample each clock cycle
and has 8 LFSRs with increasing lengths starting from 32, each with 32 parallel outputs, thus
generating 256 random bits in parallel. The used FPGA is Xilinx Virtex2 XC2V4000-6. Out of
23040 slices available, only 880 are used, that is, about 3.8%. The maximum frequency reported
after synthesis was 180 MHz. This compares very favorably with the reference implementation,
which takes up 10% of the same FPGA and runs at 133 MHz post synthesis.
4.3. Spectrum-Shaping Filter
The challenge is how to design a lter with the desired magnitude Y
d
(). A FIR lter would
require a very large number of taps, while an all-pole or auto-regressive lter, although relatively
more ecient, would still requires a high order to approximate the autocorrelation for large
lags. Our approach uses an adaptation of the design method presented in [88] for designing IIR
lters with an arbitrary impulse response.
According to this method, an IIR lter of order 2N is synthesized as a cascade of N second-
order canonic sections (bi-quads). A general second-order lter is dened by ve coecients.
Its transfer function has two poles and two zeros, which are usually complex conjugate pairs,
78 Chapter 4 Multipath Channel Simulators
having the following transfer function:
H(z) =
b
0
+b
1
z
1
+b
2
z
2
1 +a
1
z
1
+a
2
z
2
(4.10)
The straightforward implementation of this equation leads to the conguration in Figure 4.16a,
referred to as the direct form I or DF-I. The rst stage implements the zeros (numerator), while
the second one implements the poles (denominator) or the autoregressive part. If the two stages
are swapped, a more hardware-ecient implementation is obtained, which saves two data reg-
isters, as shown in Figure 4.16b. This conguration is referred to as the direct form II or
DF-II, and will be used in our implementation.
b0
D
D
dout din
b1
b2
a1
a2
D
D
D
D
dout din
b0
b1
b2
a1
a2
Direct form I Direct form II

(a) Direct form I
b0
D
D
dout din
b1
b2
a1
a2
D
D
D
D
dout din
b0
b1
b2
a1
a2
Direct form I Direct form II

(b) Direct form II
Figure 4.16.: Second-order canonical lter section
For N cascaded second-order sections, the transfer function is given by the following equation,
in which the constant K has been obtained by factoring out the b
0
coecients. In this way,
each section is dened by four coecients instead of ve.
H(z) = K
N

n=1
1 +b
1,n
z
1
+b
2,n
z
2
1 +a
1,n
z
1
+a
2,n
z
2
(4.11)
For z = e
j
, H(z) approaches the desired magnitude response Y
d
(). The design of the lter
is an optimization problem which consists in nding the set of 4N coecients (a
1,n
, a
2,n
, b
1,n
,
b
2,n
) that leads to the best approximation of Y
d
().
4.3.1. Filter Design Algorithm
The rst step is to discretize Y
d
() by dividing the Nyquist interval [0, ] into M + 1
frequency points. We have that
i
= i/M and Y
d
i
= Y
d
(
i
), where i = 0, 1, . . . , M. We also
dene z
i
= e
j
i
.
4.3 Spectrum-Shaping Filter 79
For a Jakes spectrum, given by (2.5), we dene L = M, where [0, 1] is the desired discrete
Doppler rate. The discretized magnitude response is thus given by the following equation:
Y
d
i
=
_

_
1
_
1 (i/L)
2
i = 0, 1, . . . , L 1

L
_

2
arcsin
_
L 1
L
__
i = L
0 i = L + 1, . . . , M
(4.12)
The response for i = L results from the requirement that the area under the spectrum be equal
for the sampled and the continuous cases.
We dene the vector x of length 4N that contains the coecients b
1,n
, b
2,n
, a
1,n
, a
2,n
and express
H(z) = KF(z, x), where F(z, x) is the product of biquad transfer functions in (4.11), apart
from K. Designing the lter consists now in minimizing the mean squared error (MSE):
E(K, x) =
1
M + 1

M

i=0
(|KF(z
i
, x)| Y
d
i
)
2
(4.13)
E(K, x) is a function of 4N + 1 variables. In order to reduce the problem order, we determine
the value of K that minimizes E(K, x). Knowing that K is positive, dierentiating E with
respect to K and equating with zero yields the optimum value K
o
:
K
o
=
M

i=0
|F(z
i
, x)|Y
d
i
M

i=0
|F(z
i
, x)|
2
(4.14)
The problem is now reduced to optimizing R(x) = E(A
o
x) in 4N dimensions. The gradient

x
R(x) is a vector with the partial derivatives of R(x) with respect to each element x
v
of x,
where v [1, 4N].
R(x)
x
v
=
2K
o
M + 1

M

i=0
(K
o
|F(z
i
, x)| Y
d
i
)
|F(z
i
, x)|
x
v
(4.15)
Evaluating (4.15) for all frequencies i and biquad stages n requires the calculation of 4MN
partial derivatives:
80 Chapter 4 Multipath Channel Simulators
|F(z
i
, x)|
a
1,n
= +|F(z
i
, x)| R
_
z
1
i
1 +a
1,n
z
1
i
+a
2,n
z
2
i
_
(4.16)
|F(z
i
, x)|
a
2,n
= +|F(z
i
, x)| R
_
z
2
i
1 +a
1,n
z
1
i
+a
2,n
z
2
i
_
(4.17)
|F(z
i
, x)|
b
1,n
= |F(z
i
, x)| R
_
z
1
i
1 +b
1,n
z
1
i
+b
2,n
z
2
i
_
(4.18)
|F(z
i
, x)|
b
2,n
= |F(z
i
, x)| R
_
z
2
i
1 +b
1,n
z
1
i
+b
2,n
z
2
i
_
(4.19)
We have now all quantities needed for iterative optimization. The evaluation of the cost function
and the gradient is performed according to the following steps:
1. Starting from a vector x, evaluate F(z
i
, x) at all frequencies i [0, M].
2. Compute the optimum scaling factor K
o
using (4.14).
3. Evaluate the error E
i
at all frequencies i [0, M].
E
i
= K
o
|F(z
i
, x)| Y
d
i
(4.20)
4. Evaluate the squared error cost function R(x).
R(x) =
1
M + 1

M

i=0
E
2
i
(4.21)
5. Determine the elements of gradient vector
x
R(x). In the equation below, x
v
is one of
the a
1,n
, a
2,n
, b
1,n
, b
2,n
biquad coecients, where v [1, 4N] and n [1, N].
[
x
R(x)]
v
=
2K
o
M + 1

M

i=0
E
i
|F(z
i
, x)|
x
v
(4.22)
Knowing how to evaluate the cost function and the gradient, we can use an iterative optimiza-
tion technique to nd an optimum coecient set. We have selected the Ellipsoid algorithm
because it is very simple to code and works well with highly non-linear target functions. The
convergence is not that fast as in the case of a descent method, but since the lter design is
done only once this is not a big issue.
4.3.2. A Case Study
In the following, we use the method outlined above to design spectrum-shaping lters for
the three Doppler proles introduced in Subsection 2.1.2: Jakes, at, and Gaussian. The
4.3 Spectrum-Shaping Filter 81
generated magnitudes are normalized, in the sense that the magnitude is always 1 at DC. We
have created a exible function that generates discrete Doppler proles, which accepts three
parameters:
1. The number of subdivisions M of the Nyquist interval. The resulting Doppler prole will
have M + 1 elements.
2. The Doppler prole type. It can be either jakes, at, or gauss.
3. The Doppler frequency f
D
. For the Jakes and at proles it is the maximum Doppler
frequency f
Dmax
, while for the Gaussian proles it represents the Doppler spread
D
.
For our scenario we used M = 512 frequency points since this ensures a good trade-o between
precision and computation time. The Doppler frequency f
D
was chosen to be 0.25. The ratio-
nale is that we want low frequencies because it results in lower interpolation errors. However,
if the frequency is too low, the constraints on the spectrum-shaping lter are increased.
The lter design algorithm accepts three parameters:
1. The discrete magnitude response at M + 1 frequencies. In this case it is the desired
Doppler prole.
2. The number of biquad sections.
3. The target minimum squared error (MSE) for the resulting magnitude response.
In the following, we investigate the relationship between the desired MSE and the required
number of biquad sections for the three Doppler proles, for a discrete Doppler frequency of
0.25. For a number of biquads between 1 and 8 we determine the minimum achievable MSE.
The results, shown in Table 4.4, indicate that a Gaussian lter requires much fewer stages for
a given MSE. The reason is that the magnitude response is very smooth, unlike the Jakes and
at lters which exhibit a very sharp cut-o. The Gaussian lter achieves excellent performance
with only two biquad stages, while for the other two at least ve stages are required. That is
why no values for MSE are given for Gaussian lters with more than two taps. It is also worth
mentioning that each extra stage reduces the MSE by approx. 10, i.e. with 10 dB RMS.
Table 4.4 helps us to choose the appropriate number of stages depending on the specic
requirements. To be on the safe side, we impose a maximum MSE of 10
6
. The appropriate
lter sizes and the actual MSEs achieved in one of the optimizations are listed in Table 4.5.
The MSE is given for the original 512 frequency points at which the lters were optimized.
Figure 4.17 shows the magnitude response of the three designed lters in logarithmic and
linear magnitude axes. The logarithmic plots also display the error between the designed and
the theoretical magnitude. Since these lters will be used now for both HW and SW implemen-
tations, we also list the designed coecients in Appendix B for reference. The multiplicative
82 Chapter 4 Multipath Channel Simulators
# SOS
Doppler type
Jakes Flat Gaussian
1 7.45 10
2
1.92 10
2
1.21 10
4
2 2.09 10
2
3.65 10
3
1.74 10
9
3 2.66 10
3
8.75 10
4

4 3.03 10
4
1.20 10
4

5 3.65 10
5
1.30 10
5

6 4.40 10
6
1.24 10
6

7 5.10 10
7
1.24 10
6

Table 4.4.: Minimum achievable MSE


Doppler type Taps MSE
Jakes 7 9.50 10
7
Flat 7 9.81 10
7
Gaussian 2 1.31 10
9
Table 4.5.: MSE for the designed lters
constant is scaled so that the variance of a white Gaussian noise remains unchanged after
ltering.
Since the lter introduces correlation between the samples of a white Gaussian noise and reduces
the noise variance (power), the signal has to be amplied after ltering to restore its original
power level. If we know the impulse response g(n) of the lter (discrete and innite), the
subunit power gain A
p
is given by (4.23). The higher the number of samples considered, the
higher the accuracy.
A
p
=

2
out

2
in
=

n=0
g(n)
2
(4.23)
If we know the magnitude response H(f) (continuous and nite), the power gain can be also
expressed as in , where N +1 is the number of frequencies at which the magnitude response is
evaluated. For good precision, N must be high enough, e.g. at least 1024.
A
p
=

2
out

2
in
=
1
N + 1
N

i=0
H(i/N)
2
(4.24)
Once the power gain A
p
is determined, the normalization factor can be simply calculated as the
inverse of the square root of A
p
. For the three lters we designed, we merged the multiplicative
constant K with the normalization factor to end up with only a multiplication instead of two.
4.3 Spectrum-Shaping Filter 83
0 0.25 0.5 0.75 1
80
60
40
20
0
20
Normalized frequency
M
a
g
n
i
t
u
d
e

(
d
B
)


Magnitude response
Magnitude error
(a) Jakes, log.
0 0.25 0.5 0.75 1
80
60
40
20
0
20
Normalized frequency
M
a
g
n
i
t
u
d
e

(
d
B
)


Magnitude response
Magnitude error
(b) Flat, log.
0 0.25 0.5 0.75 1
80
60
40
20
0
20
Normalized frequency
M
a
g
n
i
t
u
d
e

(
d
B
)


Magnitude response
Magnitude error
(c) Gaussian, log.
0 0.25 0.5 0.75 1
0
0.5
1
1.5
2
Normalized frequency
M
a
g
n
i
t
u
d
e


Magnitude response
(d) Jakes, lin.
0 0.25 0.5 0.75 1
0
0.5
1
1.5
2
Normalized frequency
M
a
g
n
i
t
u
d
e


Magnitude response
(e) Flat, lin.
0 0.25 0.5 0.75 1
0
0.5
1
1.5
2
Normalized frequency
M
a
g
n
i
t
u
d
e


Magnitude response
(f) Gaussian, lin.
Figure 4.17.: Magnitude responses of the designed Doppler lters
The constants K in Table B.1 are already normalized to ensure a unity power gain for WGN
input.
4.3.3. Implementation Guidelines
The lter design algorithm described in the previous section returns second-order sections with
poles and zeros in arbitrary order. Finite-precision hardware realizations, however, require the
optimization of the lter in order to reach one of the following two goals: 1) minimizing the
probability of overow or 2) minimizing the peak round-o noise. In the case of a Doppler
spectrum shaping lter, the latter optimization is more desirable.
For a given transfer function implemented with second-order sections, minimizing the peak
round-o noise is achieved by 2-norm scaling and by reordering of the sections so that the poles
84 Chapter 4 Multipath Channel Simulators
are in descending order [91], which means that the rst section has the poles closest to the unit
circle. The zeros and the poles are then grouped according to their proximity, starting with the
poles closest to the unit circle and successively matching each pole with the closest remaining
zeros until all of them are matched.
The goal of the 2-norm scaling is to achieve a constant noise variance after each section. The
2-norm of a lter is dened by the equation below and is intuitively the area under the squared
magnitude response or the power gain of a ltered white noise.
H
2
=

1
2
_
2
0
|H(e
j
)|
2
d (4.25)
The normalization is performed by adjusting the b
0
coecient of each section so that the 2-
norm of the lter formed by cascading the current section with all previous sections becomes
one. The process starts with the rst section and is described by the equation below. In this
equation H
k
is the magnitude response of the individual sections, while H
1,n
is the magnitude
response of all cascaded sections up to and including current section n of the total number of
sections N.
H
1,n

2
=

_
1
2
_
2
0

k=1
H
k
(e
j
)

2
d = 1, n 1 . . . N (4.26)
In the case of a Doppler spectrum shaping lter, the variance of the input white noise has
to be preserved. Using the above scaling method, the variance remains the same after each
lter section, and the additional multiplicative constant K is no longer necessary. With each
new section, the frequency response converges to the overall lter response. As an example,
we consider the Jakes lter designed in the previous section, which has seven second-order
sections. Figure 4.18 shows the frequency response of the individual sections, as well as their
cumulative response.
The hardware realization of a digital lter involves the discretization of the data samples and the
coecients. After multiplications and accumulation, the bitwidth of the intermediate results
becomes very large and has to be scaled down. Dedicated scaling (casting) blocks are used
for this purpose. For a discretized second-order section, the schematic containing the casting
blocks is shown in Figure 4.19. The casting to a higher precision is denoted by <<, while the
casting to a lower precision by >>. In order to avoid the two subtractions in the straightforward
DF-II implementation, shown in Figure 4.16b, the minus is embedded into a
1
and a
2
, which
incurs no additional hardware costs.
Figure 4.19 also shows the bitwidth at dierent points in the signal path. The four bitwidths
encountered in the implementation are the following:
4.3 Spectrum-Shaping Filter 85
0 0.2 0.4 0.6 0.8 1
80
70
60
50
40
30
20
10
0
10
20
30
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e

(
d
B
)
Normalized frequency


Section 1
Section 2
Section 3
Section 4
Section 5
Section 6
Section 7
(a) Individual sections
0 0.2 0.4 0.6 0.8 1
80
70
60
50
40
30
20
10
0
10
20
30
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e

(
d
B
)
Normalized frequency


Sections 1 ... 1
Sections 1 ... 2
Sections 1 ... 3
Sections 1 ... 4
Sections 1 ... 5
Sections 1 ... 6
Sections 1 ... 7
(b) Cumulative sections
Figure 4.18.: Magnitude response of the second-order sections for the designed Jakes lter
D
D
dout din << >> >>
DW SW SW+CW
SW+CW
SW+CW
AW
-a1
CW
-a
2
CW
b
0
CW
b1
CW
b
2
CW
SW+CW SW
SW
SW
SW
SW
SW+CW
SW+CW
AW DW
Figure 4.19.: Discretized second-order lter section with casting blocks
DW: data bitwidth
CW: coecients bitwidth
SW: states bitwidth
AW: accumulator bitwidth
The result after multiplications and accumulation has the largest bitwidth, which must be at
least as large as the bitwidth after multiplication, SW+CW. The accumulation result needs to
be cast down to the bitwidth of the lter states or of the output. Depending on the requirements,
casting can be performed by truncation or rounding. Rounding produces a lower round-o
noise but requires an extra adder. Another parameter of the casting is the overow behavior,
which can be either wrapping or saturation. Wrapping is obtained with no hardware costs, by
simply discarding a number of MSBs. Saturation, however, requires two comparators and two
86 Chapter 4 Multipath Channel Simulators
multiplexors. Unlike casting to a lower precision, casting to a higher precision only consists in
appending a number of zeros.
If multiple second-order sections are cascaded, casting of the input to the accumulator precision
and of the accumulator to the output precision must be only performed once. No casting is
necessary between the sections. Moreover, the accumulation of the multiplications with the a
coecients of a section can be combined with the accumulation of the multiplications with the b
coecients of the previous section. These considerations can be better observed in Figure 4.20,
which shows three cascaded second-order sections, with the shared accumulations marked by
dashed rectangles.
D
D
din
b0,1
b1,1
b
2,1
-a1,1
-a
2,1
>>
D
D
b0,2
b1,2
b
2,2
-a1,2
-a
2,2
>>
D
D
b0,3
b1,3
b
2,3
-a1,3
-a
2,3
>> << >> dout
Figure 4.20.: Cascade of three second-order lter sections
Depending on the required throughput, dierent architectures can be used for the implementa-
tion of a cascade of second-order lter sections. In the following we denote with N the number
of lter sections. The three most straightforward solutions are outlined in the following list.
Fully parallel. This solution enures the highest throughput, of one sample per clock cycle,
but also requires the most hardware resources. Each section has 4 multipliers, 5 adders,
and 2 registers, exactly like in Figure 4.16b. The requirements grow linearly with N.
Individual sequential. Each section contains one multiply-and-accumulate (MAC) unit,
which is operated sequentially under the control of a small state machine. The computa-
tion of one output sample takes 5 clock cycles, which ensures a throughput independent
of N. The complexity, however, grows linearly with N as well.
Fully sequential. All computations are performed sequentially using a single MAC unit
under the control of a state machine. The computation time is 5N clock cycles, growing
linearly with the number of sections. The essential advantage is that it has the smallest
area. One drawback, however, is the fact that the internal bitwidths must be the same
for all sections, which precludes the local ne tuning possible for the other two solutions.
As the required throughput before interpolation is very low in the case of a Doppler spectrum
generator, the fully sequential architecture is the ideal candidate for a hardware implementation.
A possible implementation solution is presented in Figure 4.21. The lter states for all sections
4.3 Spectrum-Shaping Filter 87
are stored in a single synchronous RAM block, whereas the constant coecients are stored in
a dedicated ROM. The memory maps for both ROM and RAM are shown in Figure 4.21.
D
0
1
>>
RAM
(clocked)
rdata
wdata
wen
ROM
(clocked)
data
Control state machine
raddr
addr
Filter states
Coefficients
din
start
Accumulator
dout
done
r
o
m
_
a
d
d
r
r
a
m
_
r
a
d
d
r
r
a
m
_
w
e
n
a
c
c
_
i
n
i
t
>>
D
en
0
a
c
c
_
i
n
i
t
_
s
e
l
1
0
SW
CW
SW+CW
SW AW
AW
DW
DW
D
D D
D D
D D D
done
D
en
DW
AW
SW
waddr
r
a
m
_
w
a
d
d
r
D
D
1
0
Figure 4.21.: Sequential architecture for cascaded second-order sections
It is worth mentioning that no full dual-port RAM is needed because requires the same address
is used for read/write. Moreover, if the RAM supports the read-after-write mode, as it is the
case in many FPGA embedded RAM blocks, the multiplexor in front of the multiplier and the
delay register on the feedback path can be saved. The accumulation result that is written into
the RAM would be available at its output in the next clock cycle.
If the fading tap generator needs to support dierent Doppler proles, such as Jakes, at, or
Gaussian, the number of sections and the lter coecients must be made congurable. This
can be easily achieved with the presented architecture by making the control state machine
dependent on the number of sections (conguration parameter) and by replacing the coecients
ROM with a dual-port RAM block that can be written by a central processor. Synchronous
on-chip memories, which can be used as RAM or ROM, are readily available in almost all
modern FPGA devices. Most FPGA synthesis tools are able to infer them from HDL, which
results in generic and scalable designs.
The control state machine is relatively simple and consists of two counters plus a few additional
logic gates. One counter iterates through all sections, while the other sequences the 5 MAC
operations for each section. The output signals of the state machine are shown in Figure 4.23
in the case of a three-section lter. The proposed architecture has no wait states and keeps
the MAC busy for the entire duration of a computation, so that the computation takes exactly
5N clock cycles, where N is the number of lter sections. Pipelining increases the lter latency
with 3 cycles, but does not aect its throughput.
88 Chapter 4 Multipath Channel Simulators
a
1,1
a
2,1
b
0,1
b
1,1
b
2,1
a
1,2
a
2,2
b
0,2
b
1,2
b
2,2
a
1,3
a
2,3
b
0,3
b
1,3
b
2,3
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
s
1,1
s
2,1
s
1,2
s
2,2
s
1,3
s
2,3
00
01
02
03
04
05
Coefficients
ROM
States
RAM
Address maps
Figure 4.22.: Address maps for the coecients and the lter states
rom_addr
acc_init
acc_init_sel
a2,1 a1,1 b2,1 b1,1 b0,1 a2,2 a1,2 b2,2 b1,2 b0,2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
a2,3 a1,3 b2,3 b1,3 b0,3
s2,1 s1,1 s2,1 s1,1 s2,2 s1,2 s2,2 s1,2 s2,3 s1,3 s2,3 s1,3 s2,1 s2,2 s2,3
1 0 1 0 0 0 0 1 0 0 0 0 1 0 0
- - - - - 1 0 0 - - - - 0 - -
ram_addr
ram_wen 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
cycle #
done 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1
15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 start
Figure 4.23.: Signals generated by the control FSM for a three-section lter
4.4. Spectrum Shifter
In some cases, such as the ionospheric channel proles, a constant Doppler shift is specied for
each path, in addition to the Doppler spread. This frequency shift is performed by multiply-
ing the ltered complex signal with a complex exponential exp(jf
Dsh
n), where f
Dsh
is the
normalized Doppler shift between 0 and 1 and n is the sample index. It is essential that the
frequency shift do not translate the original symmetric spectrum past the Nyquist limit, i.e.
the condition f
Dmax
+f
Dsh
< 1 has to be fullled. Otherwise aliasing would occur.
Implementing a frequency shift in SW is trivial and consists in a call to sin and cos followed
by a complex multiplication with the input signal. In HW, however, generating a sin/cos pair
and then performing the complex multiplication is not the most ecient approach. A discrete
sin/cos generator with programmable frequency is usually implemented using a phase accumu-
lator and a look-up table. An alternative solution is to use a CORDIC rotator to compute the
sin/cos values for the generated phase. This HW solution is shown in Figure 4.24a.
The complex multiplier can be completely eliminated if we realize that there is no need to
actually generate sin and cos, but only to rotate the complex input signal with the phase
4.5 Polyphase Interpolator 89
CORDIC
rotator
Re
Im
Re
Im
dinre
dinim
1
0
phase
cos
sin
D
freqsh
Phase accumulator
doutre
doutim

Complex
multiplier
(a) Direct implementations
CORDIC
rotator
Re
Im
Re
Im
dinre
phase
D
freqsh
Phase accumulator
dinim
doutre
doutim
(b) Optimized implementations
Figure 4.24.: Frequency shifter implementations
computed by the phase accumulator. The same CORDIC rotator used for sin/cos generation
can be directly inserted in the signal path, thus obviating the need for a complex multiplier.
This HW solution is shown in Figure 4.24b for comparison.
The width of the phase accumulator depends on the required frequency resolution. For a N-bit
accumulator, the frequency can be varied linearly between 0 and 1 in 2
N1
increments, with
a frequency resolution of 1/2
N1
. If the phase accumulator increment is A
inc
, the normalized
frequency shift is exactly A
inc
/2
N1
.
4.5. Polyphase Interpolator
The interpolation factors can be very high. The narrower the desired spectrum, the higher
the interpolation factor. Values of 1000 are not uncommon in many applications, like Doppler
fading tap generators.
Various interpolation solutions for fading taps generators have been proposed in the literature.
In [51] the interpolation factor is restricted to integers and a single poly-phase interpolation
stage is used. In [82] a multi-stage approach is preferred, such as described in [14]. The
interpolation factor is in this case a composite number, i.e. L =

K1
k=0
L
k
, where K is the
number of successive interpolation stages with integer interpolation factors.
Restricting the interpolation factor to integers may not be exible enough for certain appli-
cations. One example would be a simulation that requires the Doppler frequency to sweep a
given interval with many intermediate steps. The accuracy will suer because of the disconti-
nuities caused by the discrete set of the Doppler frequencies that can be generated. However,
90 Chapter 4 Multipath Channel Simulators
the computational eciency of the integer factor interpolation is very good for a poly-phase
implementation, especially for single-stage solutions.
Another essential disadvantage of the conventional interpolation solutions is the fact that they
are very inexible for hardware implementation. Changing the interpolation factor entails the
changing of all poly-phase coecient sets. This is not an issue in software, where the coe-
cients are computed o-line before the simulation starts. In hardware, however, the dierent
coecients need to be computed o-line during design and stored, for each interpolation fac-
tor. There are practical solutions that alleviate this problem by reducing the number of stored
coecients, but they all suer from increased complexity and reduced exibility.
The solution we propose eliminates all the above mentioned problems. It oers very high
exibility in choosing the interpolation factor, without sacricing the computational eciency
and scalability, being at the same time suitable for both software and hardware implementation.
Moreover, both up-sampling and down-sampling can be implemented with the same structure,
which might be of interest in some applications. In software, it allows for arbitrary interpolation
factors, the only limitation being the intrinsic oating-point number precision. In hardware,
the interpolation factors are restricted to the formula 2
N
/X, where N is the width of an
accumulator register and X is an integer increment.
4.5.1. Architecture Overview
The architecture of the interpolator (resampler), shown in Figure 4.25, consists of three main
components:
Phase accumulator
Coecients generator
Interpolation (resampling) FIR lters
The phase accumulator keeps track of the sub-sample phase by incrementing its value with a
constant value for each output sample. A new sample is read from the input each time the
accumulator overows. We denote with N the width of the accumulator and with A
inc
the
increment. The upsampling factor, f
s,out
/f
s,in
, is given by the formula:
F =
2
N
A
inc
(4.27)
For upsampling, we have that A
inc
< 2
N
. An example of upsampling process is shown in
Figure 4.26 for N = 4 and A
inc
= 5, with a resulting upsampling factor of 16/5. In the gure,
red arrows mark the transitions for which accumulator overow occurs. It is readily apparent
that the accumulator phase represents the sub-pixel phase of the output interpolated samples.
4.5 Polyphase Interpolator 91
D
Dout
D D Din D D
Coefficients
generators
D +
Ainc
Phase accumulator
Interpolation FIR filter
phase
[0...1)
N
C0 C1 C2 C3 C4 C5
Resampler (variable delay element)
Figure 4.25.: Resampler architecture
This normalized phase is computed as A
inc
/2
N
and lies in the range [0, 1).
Input
samples
Output
samples
Phase: 0 5 10 15 4 9
Ainc
14 3 8 13 2 7 12 1
2
N
Figure 4.26.: Example of upsampling process
The interpolator works in pull mode, i.e. it is driven by the output. The output data rate is
xed and controls the phase accumulator. The input rate is lower and variable, depending on
the actual interpolation factor. Decimators, on the other hand, work in push mode, where the
accumulator is operated with the xed rate of the input, while the output rate depends on the
decimation factor.
The sub-pixel phase generated by the phase accumulator is now used for resampling. Resam-
pling means to generate an output sample from a predened number of input samples, with a
relative phase between two original samples. The resampling process consists in multiplying the
input samples with a set of weights that depend on the desired phase. Physically, the resampler
is implemented as a FIR lter with variable coecients and a block that generates the appro-
priate coecients based on the desired phase. In the eld of communication, the resampler
block is also referred to as variable delay element, since it can be regarded as introducing a
sub-sample delay in the input signal.
An advantage of the proposed interpolation architecture is apparent the case of multi-channel
interpolation, i.e. when more than one data channel is upsampled at the same time. In our
92 Chapter 4 Multipath Channel Simulators
case, the signal to be upsampled is complex, so we have two channels processed in parallel.
Other cases include multi-channel audio sampling rate conversion and image scaling, e.g. RGB
or YUV. In this case, the phase accumulator and the coecients generator can be shared and
only one interpolation FIR lter per extra channel is needed, as shown in Figure 4.27.
Coefficients
generators
Ainc
C0
Phase
accumulator
Interpolation FIR filter
C1
C2
C3
C4
C5
Interpolation FIR filter
Q
in
I
in
I
out
Q
out
Figure 4.27.: Multi-channel interpolation architecture
4.5.2. Polyphase Coecients Generator
In order to determine the relationship between the desired phase and the coecients of the
interpolation lter we discuss rst the poly-phase decomposition of interpolation functions.
The principle of the interpolation is to rst insert zero samples between the original samples,
then apply a low-pass lter to reject the images created by oversampling at multiples of the
sampling frequency. In Figure 4.28 we show the block schematic and the spectra for a 5x
interpolation. First, 4 zeros are inserted between original samples, followed by a low-pass
ltering with f
c
= 1/5.
(a) Structure (b) Spectra
Figure 4.28.: Interpolation structure and spectra
4.5 Polyphase Interpolator 93
If the interpolation lter is ideal, i.e. rectangular with f
c
= 1/5, no loss of information occurs.
When real lters are used, however, the interpolated signal will be distorted. There are two
classes of distortions: a) linear distortions, due to the high-frequency attenuation of the lter,
and b) non-linear distortions, caused by the insucient attenuation of the image components.
Since most of the multiplications in the lter are with zero samples, the straightforward imple-
mentation of the textbook structure in Figure 4.28 is very inecient. For larger interpolation
factors the situation becomes worse. The standard solution is to use a lter with coecients
that depend on the phase of the output signal. For 8 interpolation, there are eight possible
output phases and therefore eight coecient sets.
An example of polyphase decomposition of the original interpolation FIR lter is shown in
Figure 4.29, where the eight phases have been shown using dierent colors. The original
lter is symmetrical with 47 taps, whereas the poly-phase lter has only 6 taps and 8 sets of
coecients. Thus, the number of MAC operations per output sample decreased from 47 to 6.
The saving is even more signicant for very high interpolation factors, for which the textbook
lter approach would be extremely inecient.
0 8 16 24 32 40
0.2
0
0.2
0.4
0.6
0.8
1
Figure 4.29.: Interpolation lter poly-phase decomposition
Besides the reduced number of MAC operations, another essential advantage of the polyphase
structure is that the interpolation lter size is independent of the interpolation factor. The
latter determines only the number of coecients sets. Such a poly-phase implementation can
be used for generating interpolated samples with any phase between 0 and 1 by simply using
the appropriate coecients set. If the number of desired phases becomes very large, the storage
needed for coecients becomes prohibitive.
The solution we propose enables the generation of any output phase using a relatively low
number of stored coecients sets. The idea is to store coecients for a limited number of
equidistant phases P, usually a power of 2. For a given phase, the actual lter coecients are
computed by linear interpolation between two adjacent coecients sets. Figure 4.30 shows
the interpolation process for the two central taps of a 6-tap lter. The number of coecients
sets for P phases is P+1 because the coecients for phase 1 are needed for linear interpolation.
These are simply the coecients for phase 0 reversed. In the case of 4 poly-phases, 5 coecients
sets are stored, one for each of the following phases: 0, 1/4, 2/4, 3/4 and 1.
94 Chapter 4 Multipath Channel Simulators
0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
0
0.2
0.4
0.6
0.8
1
Subsample phase
C
o
e
f
f
i
c
i
e
n
t

C
3
0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
0
0.2
0.4
0.6
0.8
1
Subsample phase
C
o
e
f
f
i
c
i
e
n
t

C
4
Figure 4.30.: Linear interpolation between stored coecients
The sampling interval is thus divided into P segments of equal widths. In order to perform linear
interpolation, the segment number K
s
and the intra-segment phase
s
have to be computed,
using the following relationships. The desired output phase in range [0 . . . 1) is denoted here
by
o
.
K
s
= P
o
, K
s
[0, 1, . . . , P 1] (4.28)

s
= P
o
K
s
,
s
[0 . . . 1) (4.29)
The output coecient C
int
is computed by linear interpolation between the selected adjacent
coecients C
K
s
and C
K
s
+1
:
C
int
= C
K
s
+
s
(C
K
s
+1
C
K
s
) (4.30)
In hardware, the phase
o
is encoded using a xed number of bits N. If the number of coecients
sets is a power-of-2, say 2
Q
, the selection of the two adjacent coecients is done with two
multiplexors controlled by the rst Q MSBs of the N-bit phase word. The remaining N Q
bits represent the intra-segment phase
s
and are directly used for linear interpolation. The
schematic is shown in Figure 4.31 for 2
Q
= 4. Such a structure is needed for every lter
coecient.
In most applications, the coecients for each phase are normalized, i.e. their sum is one. This
ensures that the response at DC is the same for all phases. If this condition is not met, ripple
occurs for slowly varying signals, which shows up as high-frequency spurious components in the
spectrum of the interpolated signal.
When coecients have nite precision, the normalization of the interpolated coecients might
be aected, even if the original coecients sets are normalized. Simulations show that for
discretized coecients having the sum for each phase equal with 256, the resulting sum after
linear interpolation can vary with 2 around the average 256. The solution is to renormalize
4.5 Polyphase Interpolator 95
2 MSB
phase
[0...1)

0
1
2
3
N-2 LSB N
M
U
X
0
1
2
3
Linear interpolation
CXINT
M
U
X
CX44
CX34
CX24
CX14
CX04
Constant
coefficients
Figure 4.31.: Schematic of a coecient generator
the coecients after linear interpolation. Renormalization is performed by computing the error
of the sum and subtracting it from one of the two central coecients or, even better, from the
largest of them. The schematic of the proposed solution is shown in Figure 4.32 for a 4-tap
interpolation lter.
2 MSB
phase
[0...1)

0
1
2
3
N-2 LSB N
M
U
X
0
1
2
3
M
U
X
C
0
4
4
C
0
3
4
C
0
2
4
C
0
1
4
C
0
0
4

0
1
2
3
M
U
X
0
1
2
3
M
U
X
C
1
4
4
C
1
3
4
C
1
2
4
C
1
1
4
C
1
0
4

0
1
2
3
M
U
X
0
1
2
3
M
U
X
C
2
4
4
C
2
3
4
C
2
2
4
C
2
1
4
C
2
0
4

0
1
2
3
M
U
X
0
1
2
3
M
U
X
C
3
4
4
C
3
3
4
C
3
2
4
C
3
1
4
C
3
0
4
C0INT
64

Demux
0 1
C1N C2N C3N C0N
C1INT C2INT C3INT
Coefficients normalization
Figure 4.32.: Post-interpolation coecients renormalization
4.5.3. Interpolation Functions
There are several types of interpolation functions known in the literature, which can be divided
into three main categories:
Polynomial: Lagrange, spline
96 Chapter 4 Multipath Channel Simulators
Windowed sinc: Lanczos, Hamming, Hanning
Optimal, matched to the signals spectrum
The two main parameters of an interpolation function are the number of samples of the orig-
inal signal that are taken into account for interpolation and the integer interpolation factor.
Additionally, the optimal signal-matched interpolation requires the knowledge of the signal
spectrum or its autocorrelation function. For our study we consider the Lagrange, Lanczos,
and the signal-matched interpolation.
The interpolation function is decomposed into its poly-phase components for eciency reasons,
as shown in Subsection 4.5.2. Each output sample is obtained by multiplying a xed number
of neighboring input samples by a set of coecients that depend on the desired phase.
We want to compare the performance of various interpolation functions for a given input signal.
In the following analysis we consider a at-spectrum band-limited Gaussian noise. First, we
consider a bandwidth of 0.25 and determine the mean square error (MSE) of the interpolated
output as a function of the sub-pixel phase between 0 and 1. The test bench, shown in Fig-
ure 4.33, consists of an ideal band-limited noise generator, followed by a decimator and an
interpolator. The decimation and the interpolation factors are equal with the number of phases
for which the analysis is performed. The bandwidth of the noise generator is the desired band-
width before interpolation divided by the interpolation factor. It is essential that the generated
noise has very low spectral components outside the band of interest, otherwise these would fold
back in the Nyquist band after decimation and would appear as interpolation errors.
Band-limited
noise
generator
Delay
MSE
compute
(M phases)
M:1 1:M
fmax = 0.25 fmax = 0.25/M fmax = 0.25/M
Interpolator Decimator
Figure 4.33.: Interpolation error measurement
The frequency response of the Lagrange interpolation is shown in Figure 4.36 for sub-
sample phases between 0 and 0.5, for 4 and 8 taps respectively. As the lter is symmetrical,
the coecients for phase are the coecients of phase 1 reversed, and their frequency
responses are identical.
The frequency response of the overall Lagrange interpolation lter for an interpolation factor
of 8 is shown in Figure 4.35, for 4, 6, and 8 taps. As expected, the longer the lter, the better
its impulse response. The ideal interpolation lter should have a constant frequency response
up to the Nyquist frequency, which is marked with a vertical line on the gure, while outside
the Nyquist band the response should be zero. These conditions can only be fullled by a sinc
lter with an innite number of taps. Real interpolation lters, however, cannot fulll either
of these conditions, which gives rise to two categories of interpolation errors:
4.5 Polyphase Interpolator 97
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e
Normalized frequency


Phase 0/8
Phase 1/8
Phase 2/8
Phase 3/8
Phase 4/8
(a) 4 taps
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e
Normalized frequency


Phase 0/8
Phase 1/8
Phase 2/8
Phase 3/8
Phase 4/8
(b) 8 taps
Figure 4.34.: Frequency response of Lagrange poly-phase lters
Linear distortions due to the attenuation of the upper frequencies in the original Nyquist
band.
Non-linear distortions due to insucient rejection of the image components outside the
original Nyquist band (aliasing).
0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
80
70
60
50
40
30
20
10
0
10
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e

(
d
B
)
Normalized frequency


2 taps
4 taps
6 taps
8 taps
Figure 4.35.: Frequency response of Lagrange interpolation lters
It must be mentioned here that the ultimate cause of aliasing is the dierence in frequency
response of the poly-phase lters in Figure 4.36. If the responses were the same for all phases,
no aliasing would occur, only attenuation of the high-frequency components of the original
signal.
The Lanczos interpolation belongs to the windowed-sinc family of interpolation functions. In
this case, the sinc function is windowed with the main lobe of a wider sinc. The relative width
98 Chapter 4 Multipath Channel Simulators
of later sinc is the interpolation factor, as shown in (4.31) for a factor of 3. The frequency
response of the poly-phase components for Lanczos interpolation are shown in Figure 4.36, for
4 and 8 taps respectively. It can be seen that Lanczos is better than Lagrange for higher-order
interpolation lters and for signals with signicant high-pass components.
L(x) =
_
_
_
sin(x)
x
sin(x/3)
x/3
|x| < 3
0 |x| 3
(4.31)
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e
Normalized frequency


Phase 0/8
Phase 1/8
Phase 2/8
Phase 3/8
Phase 4/8
(a) 4 taps
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e
Normalized frequency


Phase 0/8
Phase 1/8
Phase 2/8
Phase 3/8
Phase 4/8
(b) 8 taps
Figure 4.36.: Frequency response of Lanczos poly-phase lters
The frequency response of the overall Lanczos interpolation lter for an interpolation factor of
8 is shown in Figure 4.37, for 4, 6, and 8 taps. Unlike the Lagrange lter, whose frequency
response has lobes centered around odd multiples of the original Nyquist frequency, the Lanczos
lter has a faster decaying frequency response, without clear periodic structures. This property
is typical for all windowed sinc lters.
The optimal signal-matched interpolation, described in detail in Subsection 5.4.2, has a
frequency response close to that of the Lagrange interpolation, e.g. with lobes at odd multiples
of the original Nyquist frequency, except that the lobes are smaller. Since this lter is MSE-
optimized for a specic signal spectrum, it oers the best interpolation performance, provided
that the actual signal has the spectrum for which the lter has been designed. This condition
is always fullled in the case of a fading tap generator, where the Doppler spectrum before
interpolation is known by design.
In the following, we want to evaluate the three interpolation functions for various signal band-
widths and interpolation lter lengths and select the best for our requirements. We consider a
frequency range from 1/8 to 1 and lter lengths in range 4 . . . 16. The testbench used is the one
in Figure 4.33. The decimation/interpolation factor does not aect the result and has been
4.5 Polyphase Interpolator 99
0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
80
70
60
50
40
30
20
10
0
10
A
m
p
l
i
t
u
d
e

r
e
s
p
o
n
s
e

(
d
B
)
Normalized frequency


4 taps
6 taps
8 taps
Figure 4.37.: Frequency response of Lanczos interpolation lters
set to 4 in this case. The results are plotted in Figure 4.38 for a bandwidth between 1/8 and
1 on a logarithmic scale.
The truncated sinc interpolation was added as a reference to show that a smooth window
is necessary to achieve decent interpolation results. As expected, the results show that the
signal-match interpolation oers the smallest error for all bandwidths. For high bandwidths,
Lanczos interpolation is superior to Lagrange. The former has a slightly oscillating error oor
at lower bandwidths. For comparison purposes, Figure 4.39 shows the results for the three
interpolation types on the same axes, for 8 and 16 taps, this time only for two octaves between
1/4 and 1.
For 8 taps, the error is less than -80 dB for bandwidths below 0.25, for both Lagrange and signal-
matched interpolation. At 0.5 bandwidth, the error of the signal-match interpolation is with
20 dB smaller than for Lagrange. Two important conclusions can be drawn from this analysis.
First, Lanczos interpolation, like other windowed-sinc varieties, oers the worst performance for
bandwidths below 0.5 and is not appropriate for fading tap generation. Second, interpolation
error decreases very fast with the bandwidth, roughly 40 dB per octave for 8 taps and 60 dB
per octave for 16 taps. For a bandwidth of 0.25, the interpolation error is less than -90 dB for
both Lagrange and signal-matched interpolation. The later achieves -90 dB even with 4 taps
instead of 8, which doubles the computational eciency.
4.5.4. Performance Analysis
The interpolation performance analysis in the previous section has been performed for a xed
interpolation factor, assuming the lter coecients to be exactly computed for the desired
phases. The number of phases is in this case the interpolation factor, so that the coecients
100 Chapter 4 Multipath Channel Simulators
can be computed o-line and stored in a look-up table. In the case of a resampler, the number
of phase is very large, usually a power-of-2. The successive phases are computed using a
phase accumulator with an increment A
inc
. The resulting output frequency has the following
expression, where N is the width of the phase accumulator.
f
out
= f
in
A
inc
2
N
(4.32)
This equation shows that the output frequency depends linearly on the input frequency. Such
a resampler architecture can generate 2
N
equally spaced f
out
in range 0 f
in
. The increment
A
inc
can be larger than 2
N
, in which case the resampler performs downsampling and f
out
is
larger than f
in
. In our case the input bandwidth is constant 0.25, while the output bandwidth
can be varied between 0 and 0.5.
Unlike interpolation, such a resampler architecture requires the generation of output samples
for a very large number of possible sub-sample phases. Storing coecients for all these phases
in impractical. As mentioned in Subsection 4.5.2, an ecient solution is to store coecients
for a predened number of phases and linearly interpolate for all others. One of the aims of
the present performance analysis is to investigate how the number of coecients sets aects
the resampling performance.
In the specic case of fading tap generation, resampling performance is dened in terms of
spurious frequency components outside the desired Doppler bandwidth. Both the highest peak
and the variance of these spurious components are of interest. In order to determine them, the
testbench in Figure 4.40 is used. The low-pass lter limits the noise bandwidth to 0.25 and
has a very sharp cut-o and a at response in the band-pass. It is an Elliptic lter of order 16,
with pass-band ripple of 0.1 dB and stop-band attenuation of -120 dB, and is scaled so that
the noise variance after ltering is 1. It is implemented as 8 cascaded second-order sections
(bi-quads). The tough constraints force the poles very close to the unit circle, which makes
a straightforward direct-form implementation impossible. Even with double-precision oating
point coecients, such a lter would be unstable.
The high-pass lter isolates only the desired out-of-band spurious components. Its constraints
are identical to those of the low-pass lter. The cut-o frequency would be ideally the highest
frequency in the output spectrum, f
Dmax
. However, due to the nite width of the transition
bands, the cut-o frequency is chosen to be 1.1 f
Dmax
. This will only introduce a small error
when computing the variance of the spurious components.
One way to evaluate the purity of the interpolated signal is by spectral analysis of the out-of-
band resampling artifacts. The power spectral density (PSD) is obtained using Welchs method
[96] with an FFT window size of 4096 (2
12
), which ensures a good trade-o between accuracy
and computational eciency. The PSD of the original signal signal with f
Dmax
= 0.25 is shown
in Figure 4.41, where the decaying tail is a side-eect of the spectrum estimation method.
The spectrum of the spurious components depends on the spectrum of the original signal and on
the frequency response of the interpolation lter. Figure 4.41 shows the spurious interpolation
4.5 Polyphase Interpolator 101
artifacts in the case of a 6-tap lter, for an input bandwidth of 0.25 and an interpolation factor
of 8. The corresponding frequency response of the interpolation lters is plotted on the same
axes. It is now apparent why Lagrange interpolation is better than windowed sinc types for
lower input bandwidths.
The results presented so far assumed a power-of-2 interpolation factor and exact lter coef-
cients. In reality, however, the upsampling factor is a rational number, which causes the
intermediate sub-sample phase to take many dierent values. As the coecients are computed
by linear interpolation between a xed number of coecients sets, usually a power-of-2, this
will contribute to the overall interpolation error. Our goal is to evaluate this error and to
understand its variation with the interpolation factor. To this end we use the test-bench in
Figure 4.40 and measure the variance of the out-of-band components in a given range of
interpolation factors.
As in the previous analysis, the input signal has a at spectrum and a bandwidth of 0.25. The
resampling factors are chosen so that the bandwidth of the resampled signal covers four octaves,
between 1/16 and 1/2. Figure 4.43a shows the results for Lagrange interpolation with 4, 6, and
8 taps. The number of poly-phase coecients sets has been chosen suciently large suciently
large (256), so that the results are not aected by it. As expected, at bandwidths 0.25 (no
resampling) and 0.5 (decimation by 2) the error is zero. The reason is that no intermediate
samples are generated and the output consists of original samples only. The error is otherwise
relatively constant, varying only slightly with the frequency.
Figure 4.43b shows the same analysis in the case of 4 coecients sets. The results show that
the linear interpolation errors create an additional error oor, so that the error for 6 and 8 taps
is the same. Unlike the ideal case, the error depends now on the actual resampling factor. For
resampling factors 1/2 and 1/4 the error reaches the ideal level in Figure 4.43a because the
phase only takes values for which coecients sets are precomputed and no linear interpolation
is needed: 0, 1/4, 2/4, and 3/4.
As the coecients need to be stored, the number of poly-phase sets directly aects the hardware
complexity. This is less of an issue in software, were a few extra bytes of constant storage are
easily available. In order to determine the optimum number of poly-phase coecients sets we
need to know how this number aects the error oor. The test conguration in Figure 4.40
is also used in this case. We consider interpolation lters with 4, 6, and 8 taps respectively,
for a resampling factor and output bandwidth at which no error minimum occurs, such as
0.1. Figure 4.44 shows the resampling artifacts variance for Lagrange and signal-matched
interpolation.
The conclusion that can be drawn is that the number of coecients sets must be correlated
with the lter size for a target interpolation performance. It can be seen that a 4-tap signal-
matched interpolation lter with 8 coecients sets oer excellent interpolation results, with a
-70 dB variance of the out-of-band artifacts, relative to the variance of the signal. It can be
also observed that for Lagrange interpolation with 4 and 256 coecients sets the results are
consistent with those presented in Figure 4.43.
102 Chapter 4 Multipath Channel Simulators
1/8 1/4 1/2 1
80
70
60
50
40
30
20
10
0
Bandwidth (before interpolation)
I
n
t
e
r
p
o
l
a
t
i
o
n

M
S
E

(
d
B
)


4 taps
6 taps
8 taps
12 taps
16 taps
(a) Lagrange
1/8 1/4 1/2 1
80
70
60
50
40
30
20
10
0
Bandwidth (before interpolation)
I
n
t
e
r
p
o
l
a
t
i
o
n

M
S
E

(
d
B
)


4 taps
6 taps
8 taps
12 taps
16 taps
(b) Signal-matched
1/8 1/4 1/2 1
80
70
60
50
40
30
20
10
0
Bandwidth (before interpolation)
I
n
t
e
r
p
o
l
a
t
i
o
n

M
S
E

(
d
B
)


4 taps
6 taps
8 taps
12 taps
16 taps
(c) Lanczos
1/8 1/4 1/2 1
80
70
60
50
40
30
20
10
0
Bandwidth (before interpolation)
I
n
t
e
r
p
o
l
a
t
i
o
n

M
S
E

(
d
B
)


4 taps
6 taps
8 taps
12 taps
16 taps
(d) Truncated sinc
Figure 4.38.: MSE vs. bandwidth for various interpolation lters
4.5 Polyphase Interpolator 103
1/4 1/2 1
80
70
60
50
40
30
20
10
0
Bandwidth (before interpolation)
I
n
t
e
r
p
o
l
a
t
i
o
n

M
S
E

(
d
B
)


Lagrange
Signalmatched
Lanczos
(a) 8 taps
1/4 1/2 1
80
70
60
50
40
30
20
10
0
Bandwidth (before interpolation)
I
n
t
e
r
p
o
l
a
t
i
o
n

M
S
E

(
d
B
)


Lagrange
Signalmatched
Lanczos
(b) 16 taps
Figure 4.39.: MSE vs. bandwidth for two interpolation lter sizes
AWGN
generator
K = M/N fc = 0.25
fmax = 0.25/K
Resampler DUT
fmax = 0.25
fc = 1.1 * 0.25/K
Analysis
Low-pass filter High-pass filter
Figure 4.40.: Resampler spurious components measurement
0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
10
6
10
5
10
4
10
3
10
2
10
1
10
0
10
1
10
2
Frequency
P
o
w
e
r

(
l
o
g
)
Figure 4.41.: PSD of the band-limited signal used for testing the resampler performance
104 Chapter 4 Multipath Channel Simulators
0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
10
8
10
6
10
4
10
2
10
0
10
2
Frequency
P
o
w
e
r

(
l
o
g
)
(a) 6-tap Lagrange interpolation
0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1
10
8
10
6
10
4
10
2
10
0
10
2
Frequency
P
o
w
e
r

(
l
o
g
)
(b) 6-tap Lanczos interpolation
Figure 4.42.: Spectrum of the out-of-band interpolation artifacts
1/32 1/16 1/8 1/4 1/2
110
100
90
80
70
60
50
40
Output bandwidth (log)
O
u
t

o
f

b
a
n
d

c
o
m
p
o
n
e
n
t
s

v
a
r
i
a
n
c
e

(
d
B
)


4 taps
6 taps
8 taps
(a) exact coecients
1/32 1/16 1/8 1/4 1/2
110
100
90
80
70
60
50
40
Output bandwidth (log)
O
u
t

o
f

b
a
n
d

c
o
m
p
o
n
e
n
t
s

v
a
r
i
a
n
c
e

(
d
B
)


4 taps
6 taps
8 taps
(b) 4 coecients sets
Figure 4.43.: Resampling artifacts variance vs. output bandwidth
4.5 Polyphase Interpolator 105
2 4 8 16 32 64 128 256
120
110
100
90
80
70
60
50
40
30
Number of polyphase coefficients sets
O
u
t

o
f

b
a
n
d

c
o
m
p
o
n
e
n
t
s

v
a
r
i
a
n
c
e

(
d
B
)


4 taps
6 taps
8 taps
(a) Lagrange interpolation
2 4 8 16 32 64 128 256
120
110
100
90
80
70
60
50
40
30
Number of polyphase coefficients sets
O
u
t

o
f

b
a
n
d

c
o
m
p
o
n
e
n
t
s

v
a
r
i
a
n
c
e

(
d
B
)


4 taps
6 taps
8 taps
(b) Signal-matched interpolation
Figure 4.44.: Resampling artifacts variance vs. number of coecients sets

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy