ds811 - dpd-LogiCORE IP Digitlal Pre-Distortion v4
ds811 - dpd-LogiCORE IP Digitlal Pre-Distortion v4
• TDD support with automatic data selection Example Design Not Provided
© Copyright 2010 Xilinx, Inc. XILINX, the Xilinx logo, Virtex, Spartan, ISE and other designated brands included herein are trademarks of Xilinx in the United States
and other countries. All other trademarks are the property of their respective owners.
Applications
An easy-to-use software interface allows configuration, single-stepping and continuous automatic operation while
providing access to signal measurements, data, diagnostic and status information.
Usage Overview
This section briefly summarizes a sequence of events for successful incorporation of DPD into a radio unit FPGA.
Later sections provide the necessary detail.
Instantiation
1. The DPD component is added into the user's HDL code with appropriate clocks and interfacing.
2. DPD is placed after CFR in the transmit chain.
3. The design is compiled.
4. A SW environment for reading and writing the host interface is established.
Functional Description
Mathematical Foundation
Digital Pre-Distortion (DPD) acts on transmitted data to cancel the distortion in the PA by implementing an inverse
model of the amplifier. In the conceptual view of Figure 1, the pre-distortion function is applied to the sequence of
(digital) transmitted data x(n). It models the non-linearity of the PA.
The processes involved are the formulation of the model on which the pre-distortion function is based. Estimation
of its parameters is based on samples of the PA input and output. To separate the linear effect of the PA and the cir-
cuitry that drives it, estimation is based on the aligned PA output y(n). The alignment process matches the ampli-
tude, delay and phase variations of y0(n) to z(n). The predistorter is then dedicated to only modeling the non-linear
effects for which it is intended. Alignment and estimation blocks are depicted in Figure 1.
X-Ref Target - Figure 1
z (n) = ∑ h1 (i) x ( n − i)
i
+ ∑∑ h2 (i1, i 2) x (n − i1)x (n − i 2)
i1 i2
Equation 1
+ ∑∑∑ h3 (i1, i2, i3) x (n − i1)x( n − i2) x (n − i3) Volterra Series
i1 i2 i3
+ .....
Physical Volterra Series
Without loss of generality, the Volterra series can be written as
Equation 2
Q −1
z (n) = ∑ Fq x( n − q) Nonlinear Moving Average
Form of the Volterra Series
q= 0
In Equation 2, Fq may be called memory terms. If Equation 2 is to model a power amplifier, it must conform to the
boundary condition that when the signal amplitude |x| is small, the model reduces to a linear time invariant sys-
tem (since the PA is linear for small signals). A sufficient condition for this is that the memory terms depends only
on samples of signal magnitude, |x(n - p)| with p <= q.
Equation 3
Fq = ∑ a{i}q f{i}q Series Expansion
{i} of the Memory Terms
where the set {i} covers the number of indices required to form the terms. Comparison with Equation 1 shows that
the basis functions f{i}q are of the form |x(n - s)|a |x(n - t)|b |x(n - v)|c…. . These can be called Volterra products.
The simplest possible form for f{i}q is unity (a = b = c …. = 0). In this case the series is an FIR filter.
Xilinx DPD versions 1 and 2 used the well known Memory-Polynomial (MP) model. In this model, (i) is one index
k running from 0 to K - 1 and fkq = |x(n - q)|k. This corresponds to selecting diagonal Volterra series terms. The
memoryless pre-distortion model is a special case of this with Q = 1.
In this DPD version, there are options to use the Memory-Polynomial model, but also to configure for models based
on a more general selection of terms. With these configurations, improved correction performance, particularly for
wideband signals, is observed. The pre-distortion correction architecture is an option for core generation and software
configuration. Four possible architectures - called A, B, C and D - can be selected, having increasing complexity of
diagonal and off-diagonal memory terms. In addition the polynomial order can be selected to be 5 or 7. The poly-
nomial order is the maximum value of a,b,c ... appearing in the Volterra products used.
where Z is a column vector of the signal samples z(n), A is a row vector of all the a{i}q and the rows of U are the elab-
oration of all the (|x(n - s)|a |x(n - t)|b |x(n - v)|c…)x(n - q) in the model for each x(n) = y(n), the samples of the
aligned PA output.
Equation 4 can be solved by pre-multiplying each side by UH, the Hermitian transpose of U, to give
Equation 5
VA = W System to be Solved for the
Pre-distortion Coefficients
In Equation 5, V = UHU and W = UHZ. It is a linear system whose solution is the best least-squares estimate for a{i}q
over the sample length L.
An entirely new set of coefficients can be obtained from each new data capture. Alternatively, the coefficient can be
iterated with the Damped-Newton method.
The solution to VA = W can be expressed as A = V \ W. Within this notation, the Damped-Newton method iterates
A according to An+1 = An + μ V \ WE, where WE = UHE and E = Z - UAn. It iteratively acts to minimize an error vec-
tor E, which is the difference between the transmitted samples and the predicted transmission based on the inverse
model over the receive samples. The damping factor μ is an adjustable parameter.
This is a general mathematical method which, when applied to DPD, improves immunity to noise and distributes
noise that is non-uniform in time over many updates, thus making the typical instantaneous behavior equal to the
mean behavior.
The Damped-Newton iteration is used only when the power is stable, as it cannot react to fast dynamics.
System Features
Hardware Description
X-Ref Target - Figure 2
RF
DAC upconverter PA
RF
ADC
IQ BB downconverter
data Microblaze
processor observation
subsystem path
Host
interface
DPD
control
Figure 2: Xilinx DPD HW Block View
Figure 2 shows that DPD is placed after CFR in the transmit signal chain. DPD operates at the DPD sample rate fs.
The selection of fs is discussed in Factors Influencing Expected Correction Performance. The signal is converted into
the analog domain by the DAC component. There may be further interpolation stages between DPD and the DAC.
Moreover, there may be digital mixing for a single DAC superheterodyne transmitted or IQ DACs for a direct con-
version transmitter. Whatever choices are made, based on system-level considerations, the net result is that the IQ
data from DPD eventually appears as modulation of an RF carrier wave at the PA. To estimate pre-distortion coef-
ficients, a sample of the PA output is fed back via the observation path and must finally be presented to the estima-
tor as IQ samples at fs. For optimal sampling bandwidth, either direct RF downconversion followed by an IQ ADC
pair or a heterodyne mixing to fs/4 and an ADC sampling at 2fs should be used. Sub-optimal sampling bandwidth
can also lead to good pre-distortion performance, depending on the signal. DPD supports a single ADC at fs and
variable feedback IF frequency. For single ADC architectures, digital downconversion is required, and this is per-
formed on the data prior to the estimation processing.
Within DPD, the data path contains resources to deal with the real-time processing required for up to eight individ-
ual antennas. A single MicroBlaze™ processor sub-system performs estimation and supporting algorithms – it is
shared between the antennas. Details of the system are given in the following sections.
The host interface is a shared-memory data and message-passing subsystem.
HW-SW Co-design
Xilinx DPD is a combination of HW and SW processes that between them realize the PA distortion inverse model
and the estimation algorithm, as described in the Mathematical Foundation section, along with features that make
for a fully engineered, practical, robust and self-contained solution.
Figure 3 depicts the main elements of the DPD solution. The HW processes are contained within the data path and
the SW processes are run in the MicroBlaze processor code. There are also Quadrature Modulator Correction
(QMC) and Overdrive Detection (ODD) SW processes not indicated in the diagram.
X-Ref Target - Figure 3
SW processes
Multipath Handling
The estimation process can update only one path at a time. The multipath DCL attends to each path in turn and esti-
mation is performed only if the power for the port being examined satisfies the criteria for coefficient update (the
same criteria as with the single-port design).
In the limiting case where only one path ever satisfies the criteria for coefficient update, that path will be continu-
ously re-estimated. This also means that if one path fails, DPD will operate correctly on the others.
Component Name
Enter the name of the core component to be instantiated. The name must begin with a letter and be composed of the
following characters: a to z, A to Z, 0 to 9, and '_'.
subset of the resource utilization data for different parameter combinations of the core. Similarly, the IP Timing
Performance provides some timing performance data. The spreadsheet included in the doc directory of the zip file
contains data for all parameter combinations.
Input/Output Ports
Figure 6 displays the signal names; Table 1 defines these signals.
X-Ref Target - Figure 6
clk
ceN_out
rst
proc_clk
proc_rst
accel_clk
ce_clr
din_i dout_i
din_q dout _q
srx_din0
srx_path_sel
srx_din1
capture_sync
host_interface _clk
host_interface _addr
host_interface _din host _interface_dout
host_interface _we
Interfaces
The DPD IP core requires various interfaces to user logic for successful operation of the design. In this section, the
clock, reset, data path, and host access interfaces are explained in detail.
Clock Interface
This core requires up to four clock input signals.
clk
Signal clk is the primary clock used to drive all the data path logic within the design. Its clock frequency will be
CLOCKS_PER_SAMPLE times the input data rate of the DPD IP core. The input data rate will also be the sample
rate at the output of DPD.
proc_clk
Signal proc_clk is the processor clock used to drive all logic connected to the MicroBlaze subsystem. This clock
can be generated such that it is synchronous to the clk signal; however, it can also be asynchronous, since the
design has all required handshaking to ensure valid data transfer between domains.
accel_clk
Signal accel_clk is available when HWA is enabled. There is no particular minimum frequency to run this signal
at; however, the higher the clock frequency is, the faster the computations are performed. Running at the same rate
as the proc_clk should give acceptable acceleration. The effect of acceleration is calibrated in the SW Features
Timing Performance section. IP Timing Performance section discusses typical maximum speeds for this clock.
host_interface_clk
The signal host_interface_clk is an additional clock input that is used to access the host interface memory for
passing instructions to, and reading status from the DPD IP MicroBlaze processor.
The clk, proc_clk and accel_clk signals should be on the global clock network of the device using global
buffers BUFGCTRL. The host_interface_clk has only one load which is a 512x36K block RAM. Hence the DPD
IP does not require this clock to be on the global network; however it is expected that this interface will be accessed
by the user's onboard processor or on-fpga processor. Hence it makes sense to drive this clock signal from a global
buffer that is created within the user logic.
These clocks can be generated using a combination of DCMs, PLLs and BUFGs, but care has to be taken to ensure
that jitter specifications are used while constraining the clock signals. The actual frequency at which these clocks
can run eventually depends upon other user logic, placement and timing constraints. For stand-alone
characterization of typical clock frequency at which clk, proc_clk and accel_clk in DPD IP can run, see IP
Timing Performance.
The IP core does not contain any global clock resources like global buffers (BUFGs) and input global buffers
(IBUFG/IBUFGDS). When the user instantiates the design, these clocks should be driven from a global clock buffer
(any of the variants of BUFGCTRL) external to the IP core. If the user chooses to let the synthesis tool infer the global
buffers, care should be taken to ensure that each of the clock inputs have the expected clock buffers. The synthesis
tool may not recognize that these ports, which are connected to the black box DPD netlists, are in fact clock input
ports. In such a case, cascaded BUFGs can get created. When a BUFG drives another BUFG using local routing
instead of global clock routing resources, excessive clock skews can occur. This can cause various setup and hold
violations in static timing analysis.
It is recommended that only stable clocks be connected to the IP when using DCM and PLL to derive the clocks. In
that case, BUFGCE (or BUFGMUX) can be used, and lock signal can be used to derive the enable signal to connect to
BUFGCE CE port.
Reset Interface
The two reset signals rst and proc_rst have active high sensitivity. In the design, rst is registered using clk and
ceN_out, while proc_rst is registered using proc_clk; each of the resets is then applied to internal logic within
their synchronized domain only. These registers help with the routing of the reset signal throughout the design. The
synchronizing register also helps keep the fan-out management local to the design. MAX_FANOUT attribute is
applied to the internal version of the reset signals, and the value of the attribute is set to REDUCE. This instructs the
map tool to employ physical synthesis rules to reduce their fan-out as per the requirements of the timing
constraints. To assist in this, map should be run with -register_duplication turned on. The resets can be wired
to a register that appears in the memory map interface of the on-board processor, thus facilitating a software reset
feature. This usage, although not mandatory, is recommended.
It is recommended that the user connect some form of reset to PLLs and DCMs generating the DPD design clock to
ensure that there is a known startup sequence, and ensure that the DCMs and PLLs have locked before
de-activating the rst and proc_rst signals. It is recommended that the user also apply a new logical reset pulse
to both these signals once the clocks have stabilized.
Capture Synchronization
In capture mode 1 (see Setting DPD Parameters section) the capture is triggered when the input signal
capture_sync is held high during an active rising edge of clk as qualified by ceN_out.
Latency
If the system integration requires an understanding of the latency through the core, refer to the end-to-end latency
in data samples or ceN_out cycles for various parameter combinations, listed in Table 2.
Table 2: Latency per Transmit Antenna Path with QMC=FALSE (independent of POLY_ORDER and HWA)
Performance CLOCKS_PER_SAMPLE
Architecture 1 2 3 4
A 23 22 19 19
B 24 23 20 20
C 25 24 21 21
D 25 24 22 21
When QMC is enabled, latency per path increases by 4 data samples or ceN_out cycles.
The 2fs rate data generated from ADCs can be converted to srx_din0/srx_din1 signals by either taking advan-
tage of interleaving capabilities of various ADCs or, if the ADC is running at twice the rate, using ODDRs, a
dual-aspect FIFO or a simple demux circuit with a FIFO to provide the srx_din0 and srx_din1 with appropriate
data.
The fs rate data or complex IQ data from ADCs can be wired into the DPD ports using a simple asynchronous FIFO
to transfer data from ADC clock domain to the clk domain.
Host Interface
The Hardware Description section describes briefly the host interface and how it is tied into the MicroBlaze proces-
sor design. This interface allows the user to provide settings and instructions to the MicroBlaze processor about the
various functions it needs to perform. This interface uses a 512x32 block RAM in dual-port mode. The
host_interface_* ports are accessible to the user at the top-level to access one port of this memory.
The Operations Guide section provides the detail and usage of the memory map of this interface.
The hardware aspect of this interface is the same as accessing a port of any dual-port memory, and the user should
drive the host_interface_din/addr/we synchronously to host_interface_clk. The
host_interface_dout is output from the memory addressed by host_interface_addr when
host_interface_we = '0'. The block RAM has “WRITE_FIRST” mode set on the port. The port also has a latency
of 2 for read operations. Figure 8 shows the timing relationship between the host_interface signals.
For the SRx port, there is a single set of srx_din0 and srx_din1. The MicroBlaze processor looks at the srx_din
ports and assumes that they are related to path 0, 1, 2, 3, 4, 5, 6 or 7 when srx_path_sel is driven to 0, 1, 2, 3, 4, 5,
6 or 7 respectively. If the number of paths is set to 1, users can leave the srx_path_sel output port unconnected.
If there are independent ADCs for all antennas, the switching should be done in the FPGA using srx_path_sel.
If the board has an RF switch to select between the output of the various PAs and then send the signal to a single
ADC, srx_path_sel should be forwarded to that RF switch. If direct hardware connection to the RF switch or
mux is not possible, the Antenna Selection Options in a Multipath Installation section should be consulted.
Using Constraints
Users should take advantage of multi-cycle path constraints and cross-clock domain constraints to get better and
faster static timing closure on their design. The DPD core is optimized to benefit from this. Users can add period
constraints mentioned in Table 5 if the clocks are independently generated. The period constraints may get derived
by the tools, if these clocks are generated from DCM/PLL/MMCM. The NET names used in this table should be
updated with the appropriate signal name as connected to the DPD core’s netlist clock inputs. Users should replace
the string "<*CLK_PERIOD>" with appropriate period values in nanoseconds.
# HWA Clock Input - Uncomment the next two lines, if HWA is selected
# NET "accel_clk" TNM_NET = "TNM_accel_clk";
# TIMESPEC "TS_accel_clk" = PERIOD "TNM_accel_clk" <ACCEL_CLK_PERIOD> ns HIGH 50%;
For multi-cycle path constraints, ceN_out is available as an output port on the DPD netlist allowing users to create
a multi-cycle path constraint as shown in Table 6. Net name should be replaced with the signal name used to
connect to ceN_out output port when instantiating the DPD core. Users should replace the string
"<CEN_PERIOD>" with appropriate period values in nanoseconds. It is typically <CLK_PERIOD> value replaced
in Table 5, multiplied by CLOCKS_PER_SAMPLE.
# dpd_inst : dpd_v4_0_component_name
# port map ( ….
# Multi-cycle path constraint (uncomment the next two lines, when CLOCKS_PER_SAMPLE=2,3 or 4)
# NET "dpd_ce_out" TNM_NET = "ce_N_group";
# TIMESPEC "TS_ce_N_group_to_ce_N_group" = FROM "ce_N_group" TO "ce_N_group" <CEN_PERIOD> ns;
#If the DPD instantiation is at a lower hierarchy, then update the NET name to reflect hierarchy.
Since the DPD IP uses up to 4 clocks and one multi-cycle path domain, it is useful to exploit the robustness in
cross-clock domain crossing logic within the DPD design, by adding the cross-clock domain constraints as shown in
Table 7. These rely on the timing groups defined in Table 5, so users should carefully replace the timing group
names if they don't match in their environment. Apply caution when using TIG constraints and evaluate whether
any of the user code (non dpd netlist) is getting covered by these timing groups and timespec constraints, and if so,
verify that TIG constraints can be validly applied to the user logic as well. Please consult Xilinx Support if you are
concerned about the applicability of these constraints in your design.
Running Simulation
Users have the option to simulate the netlist generated, using the unisim based model provided during generation.
Users should ensure that precompiled unisim libraries are available for the correct simulation tool version before
proceeding to simulate.
A typical simulation testbench includes:
• Clock source generators
• Reset generators (ensure ce_clr is de-asserted first, before rst and proc_rst are de-asserted)
• Data generators - din_* and srx_din*, capture_sync signals (ensure these are generated and fed
synchronous to clk and ceN_out)
• Data collectors - dout*, srx_path_sel signals (ensure these are sampled and collected synchronous to clk
and ceN_out)
• Host interface drivers (ensure that this interface is synchronous to host_interface_clk)
As the DPD netlist includes a processor and software code, simulation will be slow. When QMC is not incorporated
in the netlist, the DPD netlist exhibits pass through behavior after reset is released; the latency of the pass through
is described in the Latency section. When QMC is enabled, the DPD netlist requires the internal MicroBlaze
processor to initialize QMC; this introduces a long period between reset being released and the design exhibiting
pass through behavior. During the initialization period, the DPD output may be undefined and it can take up to
25000 proc_clk cycles before input data is passed to the output with no modification. The actual number will be
lower for smaller values of TX and ARCH and when HWA is absent.
To exercise the host interface with various commands, the Operations Guide section should be referred to, for
correct interaction procedure.
Operations Guide
Host Interface and SW Control Modes
DPD is controlled via the host interface RAM (see Host Interface). It is a port of a shared memory in the MicroBlaze
processor subsystem that will typically be connected to a microcontroller bus in the control plane of the application.
DPD operations are activated by writing data to addresses in the RAM. Status, results, diagnostics and data are
accessed by reading data from addresses.
The host interface memory map is organized into the regions as shown in Table 8. In what follows, individual
addresses are specified. Addresses not explicitly referenced should be considered as reserved except for the range
216 to 255, which should be considered unused. For ease of reference and support, upper-case mnemonics are
defined for key addresses, parameters, control modes and values. Where necessary, the associated value is shown
in braces following the mnemonic.
DPD features are executed via triggering control modes. They are like function calls with optional parameters that
influence the behavior of the data path and internal state, in addition to returning results. Control modes are pro-
vided that allow the user to configure DPD, run single-step estimation, run the DCL and access measurements, read
data and status information for setup, debug, and monitoring. Table 9 details the general registers involved with
control modes.
Parameters may be written into the host interface RAM at any time, but are not activated until a control mode is
executed. For unmarked items the command mode is UPDATE_ECF_PARAMETERS(17);
otherwise the command mode is:
* - SET_CAPTURE_PARAMS(11),** - SET_METER_LENGTH(6), *** - SET_DCL_PARAMETERS(12).
1. When the core is instantiated, the performance architecture GUI selection sets the hardware provision for the pre-distortion
function. By default, the ECF recognizes this and computes the appropriate number of coefficients. However, for evaluation
purposes the ECF can use a lesser degree than provisioned in the core. ARCH_SEL in Table 11 may be set using the
following relationship to enable non-default configurations.
ARCH_SEL = x + ( y – 5 ) × 2 3
where x is 1, 2, 3 or 4 representing ARCH = A, B, C and D respectively and y is the polynomial value POLY_ORDER = 5 or 7.
For example D/7th order is ARCH_SEL =20 and C/5th order is ARCH_SEL =3.
The higher values of ARCH_SEL are normally beneficial only for signals where the occupied bandwidth is greater than
around one sixth to one seventh of the pre-distortion bandwidth fs, and ARCH_SEL should be set to the minimum value
required to achieve best performance. See Sample Rates.
Monitors
Various information is always available from the host interface RAM. This is detailed in Table 12.
Each power is a 64-bit value representing the sum of the individual powers of the number of samples specified in
METERLENGTH, divided by 256. To convert to dBFS, the formula is:
10*log10(256*power/(230 *METERLENGTH).
failed observation receiver or the occurrence of severe interference. If the ECF does encounter an error, the coeffi-
cients are not updated or stored. All ECF parameters are relevant to the DCL.
Table 13: DCL Control Modes
Mode Number Mnemonic Description
14 RUN_DCL[14] Run the DCL.
27 RUN_DCL_WITH_QMC[27] Run the DCL with QMC updates enabled.
Run the DCL with QMC updates enabled and
23 RUN_DCL_WITH_ACCEL_QMC[23]
QMC initial convergence is emphasized.
18 EXIT_DCL[18] Stop the DCL while retaining internal state.
Reset the DCL internal state. Executing
36 RESET_DCL[36]
SET_DCL_PARAMETERS(12) also does this.
Single Stepping
For fine control, parameter adjustment, debug, general understanding and non-standard applications DPD SW fea-
tures can be individually activated via the control modes given in Table 15. PORTNUM(2) needs to be specified.
Table 15: Single Stepping Control Modes
Mode Number Mnemonic Description
Perform a full Least-Squares update of the DPD
coefficients. This includes capturing new
2 COMPUTE_NEW_COEFFICIENTS
samples, ECF processing and updating the data
path parameters.
Set all coefficient sets to unity gain and update
3 RESET_COEFFICIENTS
the data path for pass-through.
Update the data path for pass-through without
7 DPD_OFF
changing the internally stored coefficients.
Update the data path from the internally stored
8 DPD_ON
coefficients.
Reset the internally stored QMC coefficients and
21 RESET_QMC
set the QMC data path block to pass-through.
22 QMC_SINGLE_STEP Update QMC by making a single iteration.
Update the QMC data path block from the
24 QMC_ON
internally stored coefficients.
Update the QMC data path block for
25 QMC_OFF pass-through without changing the internally
stored coefficients.
Perform a Damped-Newton iteration of the DPD
coefficients. This includes capturing new
33 DAMPED_UPDATE
samples, processing the ECF, and updating the
data path parameters.
Signal Analysis
To aid setup and debug, the control modes shown in Table 16 give access to the signals processed by DPD and mea-
surements made by DPD on those signals. PORTNUM(2) needs to be specified.
A capture can be triggered and the captured data, the (transmit) power and histogram can be read out.
Transfer of bulk data uses a paged mechanism. Each page is 128 data long and is available at addresses 384-511.
The capture RAM is 8192 samples long and therefore requires 64 page accesses. The parameter PAGENUMBER
(address 122) specifies a page number from 0 to 63. The first 4096 samples are the transmit data – each 32-bit data
consists of a concatenation of two 16-bit twos-complement data for the I (LSW) and Q (MSW) samples. The upper
4096 samples are the receive data that are again a concatenation of two 16-bit twos-complement data. When the
receiver is in 2*fs mode, these are the even and odd ordered samples of an 8192 sample sequence. In 1*fs mode, the
odd samples are ignored, and in IQ mode the format is to pack the two 16-bit SRx inputs into a single 32-bit value.
In all modes the SRx data will appear in captured samples as srx_din0*2^16 + srx_din1. Table 3 provides details
on srx_din0/srx_din1 interface.
The histogram integrated over METERLENGTH can also be read out. The histograms are 256 samples long. The his-
togram bins are the number of samples of signal amplitude when amplitude is divided by 128.
The capture power is the sum of the capture signal power over the number of samples specified in
SAMPLES2PROCESS(96).
Examples of uses for the signal analysis control modes are to:
• check the transmit and receive spectra (by analyzing the captured data in a tool such as MATLAB®) and
thereby verify that the signal source, RF paths, core interfaces and relevant DPD parameters are correct.
• check the CFR configuration (by examining the transmit histogram).
• determine appropriate settings for CAPTUREDELAY(118) in capture mode 1 by examining the capture
histogram and powers relative to the measurements over METERLENGTH.
Note: In the event of contacting Xilinx Support, it is useful to have the signal analysis data described in this section.
Notes:
1. For Virtex-5 the BRAM usage is reported as a number of 36K BRAMs. Typically 2x18K BRAMs are combined and reported as 36K.
Notes:
1. In some configurations Virtex-6 uses a number of 18K BRAMs in addition to full 36K BRAMs.
Notes:
1. Spartan-6 cases use a number of 9K BRAMS in addition to full 18K BRAMs.
IP Timing Performance
DPD IP was characterized for resource utilization and timing performance on Virtex-5, Virtex-6 and Spartan-6
according to the constraints shown in Table 21. These are example cases of when a single DPD IP core is placed and
routed in an otherwise empty fabric. A user application might see a different performance (better or worse). It is rec-
ommended that the user place and route the desired DPD design configuration in the user application space with
representative logic around it or with representative area groups for floorplanning. Contact your Xilinx Field Appli-
cations Engineer if you have timing issues or if guidance on floorplanning is required. For 8 TX cases, an area group
is recommended to achieve these clock frequencies.
Table 23: SW Features Timing (seconds) for POLY_ORDER = 5 and HWA enabled
SAMPLES2PROCESS
1000 2000 4000
LS dN LS dN LS dN
ECF, ARCH A 0.34 0.34 0.43 0.42 0.58 0.60
ECF, ARCH B 0.39 0.39 0.47 0.49 0.63 0.66
ECF, ARCH C 0.46 0.47 0.54 0.57 0.72 0.72
ECF, ARCH D 0.46 0.47 0.55 0.56 0.74 0.76
Non-fs/4 down-conversion ECF + 0.17 ECF + 0.17 ECF + 0.17
DCL cycle with QMC, per port ECF + 0.25 ECF + 0.25 ECF + 0.25
Table 25: SW Features Timing (seconds) for POLY_ORDER = 7 with HWA enabled
SAMPLES2PROCESS
1000 2000 4000
LS dN LS dN LS dN
ECF, ARCH A 0.38 0.39 0.46 0.48 0.66 0.63
ECF, ARCH B 0.49 0.50 0.59 0.60 0.79 0.77
ECF, ARCH C 0.69 0.70 0.79 0.81 1.00 0.99
ECF, ARCH D 0.66 0.66 0.74 0.75 0.96 0.97
Non-fs/4 down-conversion ECF + 0.17 ECF + 0.17 ECF + 0.17
DCL cycle with QMC, per port ECF + 0.25 ECF + 0.25 ECF + 0.25
WCDMA
Multicarrier data consisting of 3GPP Test Model 1 with 64 DCH is generated. Each carrier has a relative offset of 512
chips. The data is pulse-shaped, upsampled to 122.88 Msps, frequency shifted and summed.
Figure 9 and Figure 10 show the spectra before and after pre-distortion, for the ARCH_SEL selections indicated, for
the carrier configurations stated in the captions.
Figure 9: Spectra for Four WCDMA Carriers before and after DPD
Figure 10: Spectra for Two Non-adjacent WCDMA Carriers 10 MHz Apart before and after DPD
WiMAX
Data is generated using Agilent Signal Studio, which is standards compliant, having arbitrary zone, burst and mod-
ulation structure. It is 5 ms TDD frame data. Figure 11 shows the spectrum before and after pre-distortion for two
10 MHz carriers (one having 75% downlink active ratio and the other having 50% downlink active ratio).
X-Ref Target - Figure 11
Figure 11: Spectra for Two 10MHz WiMAX Carriers before and after DPD
LTE
Data is generated using internally developed software. The data is standards compliant with respect to frame struc-
ture and modulation. The modulation scheme is 64 QAM and the data payload is random. Figure 12 shows results
for one 20 MHz carrier.
X-Ref Target - Figure 12
Figure 12: Spectra for a Single 20MHz LTE Carrier before and after DPD
TD-SCDMA
Data is generated using internally developed software. The data is standards compliant with respect to frame struc-
ture and modulation. The data payload is random. Figure 13 shows the results for an arbitrary selection of carriers
within a 15 MHz bandwidth.
X-Ref Target - Figure 13
Figure 13: Spectra for Six TD-SCDMA Carriers in 15MHz Total Bandwidth Carrier before and after DPD
Multicarrier GSM
Data is generated using internally developed software. The data is standards compliant with respect to frame struc-
ture and modulation, which is GMSK. The data payload is random. Figure 14 shows the result for an arbitrary selec-
tion of carriers within a 20 MHz bandwidth.
X-Ref Target - Figure 14
Figure 14: Spectra for Four GSM Carriers in 20MHz Total Bandwidth Carrier before and after DPD
Figure 15: Dynamic Performance: Adjacent Channel Ratio for Four WCDMA Carriers
with Total Power Varying with Slow Steps, Fast Steps and Fast Random Profiles
X-Ref Target - Figure 16
Figure 16: QMC Performance: Spectra for a Single Offset WCDMA Carrier
before and after QMC and DPD Correction
Sample Rates
Performance depends on the sample rate of DPD. A rule of thumb is that the pre-distortion bandwidth fs should be
at least five times the signal bandwidth. However factors such as the PA design, the degree of correction required
and the signal type will come into play. Non-contiguous carrier configurations generally require a higher DPD sam-
ple rate than contiguous carrier configurations.
Excess pre-distortion bandwidth can also be a problem. Occasionally wideband artifacts can be observed when fs is
greater than approximately seven times the signal bandwidth, particularly if ARCH_SEL is set to 2, 3 or 4.
For optimal receive bandwidth, the DPD sample receiver rate should be exactly twice the DPD sample rate and
have the signal centered in a Nyquist zone. However variances are supported. DPD can be configured for a sample
receiver at one times the sample rate, and in many situations there will be little performance degradation. Non-con-
tiguous carrier configurations, however, may be particularly problematic.
There is also support for a signal not exactly centered in a Nyquist zone. If the offset is small, there may be little
impact on performance.
A direct conversion receiver can also be used. In this case, QMC will be unreliable unless the receiver is individually
and externally calibrated.
RF Performance
The performance of DPD is intimately related to the quality of the RF design. The RF bandwidth should be at least
five times the signal bandwidth, but special considerations may apply at the edge of the band, depending on the RF
filter line-up. These are matters somewhat outside of the scope of the digital design Xilinx offers. Within the RF
bandwidth, we are unable to put limits on the amplitude and phase error that might be tolerated. Performance with
RF paths worse than the Axis CDRSX2 test platform is unknown.
Parameters
The default and user-controllable settings described here normally give sufficient control for successful perfor-
mance in most operational scenarios. However DPD has a number of internal parameters and settings, and in some
cases performance issues can be addressed by changing these. Xilinx Support should be contacted for assistance.
See also the Support section of this document for support stipulations.
Abbreviations
3G Third Generation
3GPP Third Generation Partnership Project
ACP Adjacent Channel Power
ACLR Adjacent Carrier Leakage Ratio
ADC Analog-to-Digital Converter
BTS Base Transceiver Station
BUFG Global Buffer (Xilinx FPGA component)
CAPEX Capital Expenditure
CDRSX Common Digital Radio System – Xilinx Edition
CFR Crest Factor Reduction
CMP Configured Maximum Power
CPICH Common Pilot Channel
DAC Digital-to-Analog Converter
dB decibel
dBc dB relative to carrier
dBm dB relative to one milliwatt
dBFS dB relative to digital full-scale
DCH Dedicated Transport Channel
DCL Dynamic Control Layer
DCM Digital Clock Manager (Xilinx FPGA component)
DPCH Dedicated Physical Channel
DPD Digital Pre-Distortion
DUC Digital Up Conversion
ECF Estimation Core Function
FCC Federal Communications Commission
FIFO First In, First Out
FIR Finite Impulse Response
HSDPA High Speed Downlink Packet Access
ISR Interrupt Service Routine
LDMOS Laterally Diffused Metal Oxide Silicon (Field Effect Transistor)
LMB Local Memory Bus
LO Local Oscillator
LTE Long Term Evolution
LUT Lookup Table
MCP Maximum Capacity Power
Msps Mega-samples per second
MIMO Multiple Input Multiple Output
MP Memory-Polynomial
NSNL Non-Static Non-Linearity
ODD Over-Drive Detection
References
1. Xilinx Peak Cancellation Crest Factor Reduction (PC-CFR) V2.0 product page
Evaluation
An evaluation license is available for this core. The evaluation version operates in the same way as the full version
for several hours, dependant on clock frequency. The data output will comprise a delayed version of the data input,
once the evaluation period ends. The host interface shall report EVAL_LICENSE_TIMEOUT status value (see
Table 10) once the hardware times out. If you notice this behavior in hardware, it probably means you are using an
evaluation version of the core. The Xilinx tools warn that an evaluation license is being used during netlist
implementation. If a full license is installed, delete the old XCO file, reconfigure and regenerate the core.
Support
Xilinx provides technical support for this LogiCORE product when used as described in the product
documentation. Xilinx cannot guarantee timing, functionality, or support of product if implemented in devices that
are not defined in the documentation, if customized beyond that allowed in the product documentation, or if
changes are made to any section of the design labeled DO NOT MODIFY.
Refer to the IP Release Notes Guide (XTP025) for further information on this core. There is a link to all the DSP IP
and then to each core. For each core, there is a master Answer Record that contains the Release Notes and Known
Issues list for each core. The following information is listed for each version of the core:
• New Features
• Bug Fixes
• Known Issues
Ordering Information
This core may be downloaded from the Xilinx IP Center for use with the Xilinx CORE Generator software v12.3 and
later. The Xilinx CORE Generator system is shipped with Xilinx ISE Design Suite development software.
To order Xilinx software, contact your local Xilinx sales representative.
Information on additional Xilinx LogiCORE IP modules is available on the Xilinx IP Center.
Revision History
The following table shows the revision history for this document:
Notice of Disclaimer
Xilinx is providing this product documentation, hereinafter “Information,” to you “AS IS” with no warranty of any kind, express
or implied. Xilinx makes no representation that the Information, or any particular implementation thereof, is free from any
claims of infringement. You are responsible for obtaining any rights you may require for any implementation based on the
Information. All specifications are subject to change without notice. XILINX EXPRESSLY DISCLAIMS ANY WARRANTY
WHATSOEVER WITH RESPECT TO THE ADEQUACY OF THE INFORMATION OR ANY IMPLEMENTATION BASED
THEREON, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OR REPRESENTATIONS THAT THIS
IMPLEMENTATION IS FREE FROM CLAIMS OF INFRINGEMENT AND ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Except as stated herein, none of the Information may be
copied, reproduced, distributed, republished, downloaded, displayed, posted, or transmitted in any form or by any means
including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of
Xilinx.