Xapp523 Lvds 4x Asynchronous Oversampling
Xapp523 Lvds 4x Asynchronous Oversampling
Summary This application note describes a method of capturing asynchronous communication using
LVDS with SelectIO™ interface primitives. The method consists of oversampling the data with
a clock of similar frequency (±100 ppm). This oversampling technique involves taking multiple
samples of the data at different clock phases to get a sample of the data at the most ideal point.
The SelectIO interface in 7 series FPGAs and Zynq®-7000 All Programmable SoCs can
perform 4x asynchronous oversampling at 1.25 Gb/s. Oversampling is performed by using
ISERDESE2 primitives. Clocks are generated from a mixed-mode clock manager
(MMCME2_ADV) through dedicated high-performance paths between the components.
Introduction Synchronizing the clock and data is the most common method of achieving communication
between devices using low-voltage differential signaling (LVDS). This means that the clock is
transmitted on one differential channel and the data on one or several other differential pairs. At
the receiver, the clock (after synchronization) is used to capture the data. This is known as
source-synchronous communication.
When transmitting data without a separate accompanying clock signal, the clock used to
capture the data must be recovered at the receiver side from the incoming data stream. This is
called asynchronous communication, also known as data and/or clock recovery. Xilinx®
transceivers use this principle. Data recovery allows a receiver to extract data from the
incoming clock/data stream and then move the data into a new clock domain. Sometimes, the
recovered clock is used for onward data treatment or transmission.
The circuit described in this application note provides a “partial solution” in that no clock is
actually recovered, but the arriving data is fully extracted. Figure 1 shows a typical use case.
© Copyright 2012–2017 Xilinx, Inc. Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, Zynq, and other designated brands included herein are trademarks of Xilinx in the
United States and other countries. All other trademarks are the property of their respective owners.
MMCM
X523_01_012012
Asynchronous For signal processing, “oversampling” means sampling a signal using a sampling frequency
Oversampling significantly higher than twice the bandwidth (or highest frequency) of the signal being
sampled. For the communication interface described in this application note, the “significantly
higher” sampling frequency is obtained using different edges of multiple phase-shifted clocks. It
is called asynchronous oversampling because the clocks used to create the sampling
frequency are nominally equal to the data stream frequency.
The circuit discussed here uses a clock (local oscillator) running at the same nominal frequency
as the data stream being captured. “Nominal” here means that the local oscillator is either
slightly faster or slightly slower than the incoming clock/data stream.
Through the use of a clock manager (MMCME2), high-speed phase-shifted clocks are
generated from a slow system clock typically provided by a local clock oscillator (see Figure 2).
OVERSAMPLE Mode
BUFIO CLK
To All ISERDES in
BUFIO 125 MHz
CLK90
BUFG IntClkDiv
OSERDES
To Data Recovery
Logic in this Clock Area
ISERDES
State
Machine
X523_02_041817
The function of the two extra clocks and ISERDES/OSERDES combination shown in Figure 2
is explained in Clocking and Data Flow, page 10. The generated CLK and CLK90 clocks make
it possible to oversample an incoming data stream on four edges, meaning that each bit of the
DDR data stream can be sampled twice, as shown in Figure 3.
X-Ref Target - Figure 3
0 90 180 270
CLK
CLK90
Different
positions of
the data with
respect to the
clocks.
If the incoming data stream is split into two branches and one branch is delayed by 45°, it is
possible to 4x oversample every data bit. The details of how this circuit is constructed using
MMCME2, IODELAYE2, and ISERDESE2 is shown in Figure 4.
BUFIO CLK
CLKIN
BUFIO CLK90
ISERDES
IDELAY
IBUFDS_DIFF_OUT
0 Shift
IDELAY
ISERDES
45 Shift
X523_04_012512
The MMCME2 generates two clock phases (CLK0 and CLK90). These are routed to the
ISERDESE2, where both the positive-transitioning and negative-transitioning edges of the two
clocks are used, creating four clock phases. Two copies of the incoming data are created by
means of an IBUFDS_DIFFOUT. One branch of the data gets a shift of 45° and the other
branch gets no phase shift. The phase shift is obtained by passing the data of both branches
through IODELAYE2. This phase-shifted version of the data is passed into a slave
ISERDESE2, effectively doubling the sample rate.
Eight clock sample phases for bit oversampling are thus created by using a combination of four
clock phases and two data sample phases, as shown in Figure 5.
X-Ref Target - Figure 5
0 90 180 270
CLK
CLK90
DATA
DATA
45 deg
DATA
135
45
90
0
X523_05_012012
7 Series The ISERDESE2 component in 7 series FPGAs is an improved version of similar components
ISERDESE2 in previous FPGA families (ISERDES in Virtex-5 FPGAs and ISERDESE1 in Virtex-6 FPGAs).
The ISERDESE2 component can implement (i.e., be configured as) different functions:
Oversampling
• In its most basic function, the ISERDESE2 provides the functionality of an IDDR flip-flop.
Mode
• A second and more complex function is that of a dedicated serial-to-parallel converter with
specific clocking and logic features designed to facilitate the implementation of high-speed
source-synchronous applications (NETWORKING mode).
• A third function is MEMORY mode, where the ISERDES is configured as a dedicated
interface for different types of memories (QDR, DDR3, etc.).
• In its fourth and final function, the ISERDESE2 can be used in OVERSAMPLING mode.
Here, ISERDESE2 is used to capture two phases of DDR data. In this mode, the
ISERDESE2 is thus used as a dual set of IDDR flip-flops.
For detailed descriptions of the ISERDESE2 functionality, see UG471, 7 Series FPGAs
SelectIO Resources User Guide.
For convenience, Figure 6 shows the ISERDESE2, with oversampling mode configuration. In
earlier implementations, the oversampling design was implemented in FPGA logic using SLICE
flip-flops. With 7 series FPGAs, this functionality is implemented in the ISERDESE2.
SHIFTIN1 SHIFTOUT1
SHIFTIN2 SHIFTOUT2
INTERFACE_TYPE : string := “OVERSAMPLE”;
SERDES_MODE : string := “MASTER”; OFB
DATA_WIDTH : interger := 4; D O
DATA_RATE : string := “DDR”; DDLY Q1
OFB_USED : string := “FALSE”; CE1 Q2
IOBDELAY : string := “IFD”;
NUM_CE : integer := 1; CE2 Q3
DYN_CLKDIV_INV_EN : string := “FALSE”; RST Q4 Q1
DDLY
DYN_CLK_INV_EN : string := “FALSE”; BITSLIP Q5 D Q D Q D Q
INIT_Q1 : bit := ‘0’;
CLK Q6 CLK
INIT_Q2 : bit := ‘0’;
INIT_Q3 : bit := ‘0’; CLKB Q7
INIT_Q4 : bit := ‘0’; CLKDIV Q8 Q2
SRVAL_Q1 : bit := ‘0’; CLKDIVP D Q D Q D Q
SRVAL_Q2 : bit := ‘0’;
DYNCLKDIVSEL CLKB
SRVAL_Q3 : bit := ‘0’;
SRVAL_Q4 : bit := ‘0’; DYNCLKSEL
OCLK Q3
OCLKB ISERDESE2 D Q D Q D Q
OCLK
Q4
D Q D Q D Q
OCLKB
X523_15_021012
Q1 M0 Q3 M0 Q2 M0 Q4 M0 Q1 M1 Q3 M1 Q2 M1 Q4 M1
Master Data (Tap 0,
Delay = 0 ps) R0 F0 R1 F1
200 ps
X523_07_032112
The data is sampled through four clock phases 400 ps or 90° apart and named CLK0, CLK90,
CLK180, and CLK270, as shown in Figure 3, page 3. Sampling points occur where the clocks
intersect the data streams. These points are named according to the format:
Qx [M or S]x
Where:
Qx = the ISERDESE2 outputs Q1, Q2, Q3, or Q4
Mx or Sx = the source ISERDESE2 (M = master, S = slave) of the data outputs (Qx)
For example, sample point Q1M1 shows where CLK0 samples the data and creates an output
at port Q1 of the master ISERDESE2.
The lines labeled E4[0] through E4[3] that connect the sample points show where the DRU is
comparing data and looking for a data edge. The formulas for the four comparisons are shown
in Equation 1 through Equation 4.
E4[0] = [Q1M1 xor Q1S1] or [Q2M1 xor Q2S1] Equation 1
sampled by CLK90. Again, the sample points are 200 ps apart relative to the original data
stream. For each comparison, one sample point falls in a rising-edge zone and the other falls in
a falling-edge zone. These two comparisons would produce an xor result of 1, indicating that an
edge (level transition) exists somewhere between the two sample points in each comparison.
Figure 8 shows what Equation 1 through Equation 4 look like in logic and how the data flows
out of the ISERDESE2 and into that logic. A stage of registers between the ISERDESE2 and
the logic facilitates the timing. This also shows how the Q4 output of the slave ISERDESE2 is
stored from the previous sample set to be compared with the new sample set.
X-Ref Target - Figure 8
Q(1) II(1)
Q1
E4(0)
Q(5) II(5)
Q2
Master
ISERDESE2 Q3 Q(3) II(3)
Q(7) II(7)
Q4 E4(1)
Q(0) II(0)
Q1
Q(6) II(6)
Q4
E4(3)
BUFG
625 MHz
X523_08_0203012
At this point, it should be clear how the data comes into the FPGA and is then fed into the DRU
for edge detection. The next step in the DRU is to process the comparison data. This simple
state machine, based upon where the data edge was and where it moves to, then chooses a
sample point away from the data edge.
The ideal sample point can be expected to move around because of voltage and temperature
variations, jitter, and offset between the source and receiver clocks. This means that the
comparison point equations are always changing value, and the state machine is always
updating based on these changing results. Figure 9 and Table 1 describe the flow of the state
machine from one set of data to the next.
00
E4(2) E4(3)
E4(0) E4(1)
10 01
E4(3) E4(2)
E4(1) E4(0)
11
X523_09_020312
In Table 1, the EQ column shows the current position of the state machine with input from
Equation 1 through Equation 4. The DO column shows what sample set is used in the
interconnect logic. Remembering that each ISERDESE1 in oversampling mode acts as two
sets of IDDR flip-flops, DO indicates which IDDR flip-flops should be used as the ideal sample
points.
Figure 9 shows, for each given state or sample set, where the state machine would go next. For
example, assume the state machine starts in state 01, which uses the Q(1) and Q(5) signals.
This maps to Q1 and Q2 out of the ISERDESE1 master, which would be CLK0 and CLK180,
respectively.
Then, if the data edge were to move to the left, the center point would be shifted from
CLK0/CLK180 to CLK90/CLK270. This would be seen by E4(3) changing its value from 0 to 1.
When this happens, the state machine moves from state 00 to state 01.
Bit Skip
When an edge moves to the left of the first data bit sample, bit skip occurs. Bit skip also occurs
when an edge moves to the right of the last data bit sample.
When an edge is detected to the left of the last sample, the new current sample is moved from
the last sample to the right corresponding to the first sample of the next data. As shown in
Table 1, when the state machine is in state 10, it samples Q(3) and Q(7). The state machine
then changes to state 00, sampling Q(0) and Q(4). However, a sample of data was already
taken in the previous state when the state machine was 10. As a result, during the first 00 state
of the state machine, one bit of the sampled bits is dropped. This is called a negative bit skip.
Negative bit skip outputs five bits per clock.
When the edge is detected to the right of the first sample, the new current sample is moved to
the left, which corresponds to the last sample of the next data. As shown in Table 1, when the
state machine is at 00, it samples Q(0) and Q(4). The state machine then changes to state 10.
In this state, it samples Q(3) and Q(7). However, no sample of data was taken during states 00
and 10. As a result, during state 10 of the state machine, the last sample is taken along with
current samples, causing seven bits to be output. This is called a positive bit skip. Positive bit
skip outputs seven bits per clock.
From Figure 9, therefore, it can be seen that bit skip occurs when there is a transition between
states 00 and 10. When there are no bit-skip conditions, the sampled data output has one bit
per clock in the SDR mode and two bits per clock in the DDR mode.
Therefore, for six bits of parallel data:
• The number of bits for a negative bit skip condition is five.
• The number of bits for a positive bit skip condition is seven.
• The number of bits for no bit skip condition is six.
2A 2B
ISERDES
Register
IBUFDS_DIFF_OUT
DRU Logic
IntClkDiv
ISERDES
DRU
BUFG ClkRef
IDELAYCTRL
BUFIO CLK
BUFIO CLK90
125 MHz
MMCM
BUFG IntClk
BUFG IntClkDiv
OSERDES
Pattern
4
ISERDES
State
Machine
X523_10_041817
ISERDESE2 and the clocks for the DRU logic are phase-aligned using a state
machine. It is also very important that the delay from ISERDESE2 to the registers in
the DRU does not exceed 600 ps.
b. In this stage of the DRU, data from the 625 MHz BUFG clock to the 312.5 MHz BUFG
clock is handled. These clocks are in phase with each other, and thus there are no
special requirements.
3. Data/clock presentation
The 10-bit data from the DRU is presented to the user interconnect logic at a 312.5 MHz
clock rate with a clock enable.
4. Clock alignment state machine
BUFIO and BUFG have an undefined phase relationship with each other (length of routing,
delay of clock buffers, etc.). To transfer data between both clock domains, CDC logic is
needed or, as used here, the clocks must be phase-aligned. To perform clock phase
alignment, a calibration scheme is set up. The clock alignment circuit uses the knowledge
that all I/Os in an FPGA I/O bank are identical and therefore have identical timing.
• An OSERDESE2 is loaded with a fixed data pattern and is clocked by the clocks from
the BUFG clock tree (IntClk, IntClkDiv). The output of the OSERDESE2 is a clock
pattern at the IntClk (625 MHz, BUFG) clock rate. Through the feedback path, the
clock pattern is captured by the neighbor ISERDESE2 running from the BUFIO clock
tree. The data-capturing ISERDESE2 runs on this same clock tree.
• Using this technique, it is possible to measure the phase relationship between the two
clocks. Using the independent phase-shift capability of the MMCM with a small state
machine, the BUFG clocks are phase-shifted to phase-match the BUFIO clocks. Along
with the CLK clock (625 MHz), the CLK90 clock (625 MHz) is phase-shifted, and with
the IntClk clock (625 MHz), the IntClkDiv clock (312.5 MHz) is phase-shifted.
The phase calibration process is illustrated in Figure 11.
X-Ref Target - Figure 11
Initial Alignment of
625 MHz BUFG
(1)
Initial Phase
Adjustment Alignment
of 625 MHz BUFG
(2)
Final Alignment of
625 MHz BUFG
(3)
1 2 3
Output of ISERDESE1
at Stages of Phase 1010 0101 1010
Alignment
X523_11_041817
To allow for this clocking scheme, correct configuration of the MMCME2 is necessary. An
example of the MMCM configuration is provided in the next section.
As shown in Figure 11, one MMCME2 is used as clock source for the interface. This means that
the MMCME2 must deliver these clocks:
• ClkRef ideally running at 312.5 MHz, but limited to 310 MHz by the IDELAY_CTRL
component parameters (REFCLK frequency = 300 MHz ±10 MHz)
• CLK at 625 MHz via BUFIO
• CLK90 inverted CLK at 625 MHz via BUFIO
• IntClk at 625 MHz via BUFG
• IntClkDiv at 312.5 MHz via BUFG
From DS182, Kintex-7 FPGAs Data Sheet: DC and Switching Characteristics, the MMCM
switching characteristics for the -2 speed grade are:
• MMCM_FIN_MIN = 10 MHz
• MMCM_FIN_MAX = 933 MHz
• MMCM_FVCO_MIN = 600 MHz
• MMCM_FVCO_MAX = 1440 MHz
• MMCM_FOUT_MIN = 4.69 MHz
• MMCM_FOUT_MAX = 933 MHz
• MMCM_FPFD_MIN = 10 MHz
• MMCM_FPFD_MAX = 500 MHz (Bandwidth set to High or Optimized)
Formulas:
F IN
D MIN = RoundUp ---------------------------- = 1 Equation 5
F PFD_MAX
F IN
D MAX = RoundDown ---------------------------- = 12 Equation 6
F PFD_MAX
F VCO_MIN
M MIN = RoundUp ---------------------------- • D MIN = 5 Equation 7
F IN
D MAX
M MAX = RoundDown ---------------------------- ⁄ F IN = 138 Equation 8
F VCO_MIN
D MIN • F VCO_MAX
M IDEAL = ----------------------------------------------------- = 11.52 Equation 9
F IN
To let the PLL inside the MMCME2 run in optimal conditions, FVCO must be maximized while
the dynamic phase detector (FPFD_MAX) cannot be exceeded.
The VCO frequency is given by Equation 10.
F VCO = F IN • M
----- Equation 10
D
D is calculated as 1 and the M value must be between 5 and 138. With an input clock of
125 MHz, if M is taken as 10, the VCO frequency is 1250 MHz. (Taking M as 12 provides a VCO
frequency of 1500 MHz, which is too high).
F VCO F IN • M
----- 125 MHz • ------ = 1250 MHz
10
Equation 11
D 1
F OUT = F IN • ----------------
M
Equation 12
D • O
Where:
O is the divider factor of the output counter of the MMCME2 clock output.
D is the value used in the MMCME2 attributes.
• The CLK0 output of the MMCME2 accepts a real number value as divider. This is
perfect for generating 310 MHz from 1250 MHz.
- The D value is 4.0322.
• The CLK1 and CLK2 outputs of the MMCME2 are used to generate the 625 MHz
clocks that get distributed to the ISERDESE2 via the BUFIO clock buffers.
- The D value for both clocks is 2.
• Clock outputs CLK3 and CLK4 are then used to generate the IntClkDiv clock of
312.5 MHz and the IntClkv clock of 625 MHz. These clock outputs must be
phase-shifted. Therefore, the attribute for the phase-shifting operation of the MMCM
must be turned on.
- The D value for CLK3 (312.5 MHz) is 4.
- The D value for CLK4 (625 MHz) is 2.
- The attribute to turn on for both is CLKOUTn_USE_FINE_PS.
• n = clock output (3 or 4).
Reference The reference design files are available for download at:
Design https://secure.xilinx.com/webreg/clickthrough.do?cid=184349
Table 2 shows the reference design checklist.
Table 3 shows a summary of the reference design utilization. The reference design contains a
dual receiver. For demonstration and test purposes, a set of PRBS transmitters and a set of
PRBS reception blocks is added on the KC705 board. Only the design utilization of the dual
receiver is listed. The XC7K325T-2-FFG900 device is used for implementation.
Notes:
1. One MMCME2 is needed per I/O bank. One I/O bank can fit up to 19 receiver channels.
One set of ISERDESE2/OSERDESE2 must be kept free for the clock phase adjustment
(CDC logic).
The design uses floorplanning constraints in the UCF file. This implementation technique
makes each receiver use the same amount of FPGA logic and allows easy multiplication of the
design in any 7 series FPGA. For more information about this, refer to the documentation inside
the reference design ZIP file.
Receiver UI and Given the DRU method used, two valid sampling points are required at all times. This means
Jitter Tolerance the starting point is 0.500 UI. The oversampling is based on evenly spaced sampling points.
Any error in that spacing causes the receiver jitter eye requirement to increase.
Receiver Jitter Eye Requirement = Eye Requirement of DRU + Sampling Phase Error
0.625 UI = (0.500 UI) + (0.125 UI)
Sampling phase error comprises all of the effects of taking a 125 MHz clock and multiplying it
up to 625 MHz, then feeding it to the two BUFIOs with a phase shift and the IODELAYE2’s
200 ps phase shift.
Sampling phase error includes:
• MMCME2_ADV jitter for the exact setting in the reference design
• MMCME2_ADV phase error between CLK0 and CLK90
• MMCME2_ADV DCD
• IODELAYE2 delay accuracy (ability to create a 200 ps phase shift)
• IODELAYE2 pattern-dependent jitter
• Any offset between the two paths of the master and slave ISERDESE2
Sampling phase error does not include:
• Any other frequency of the clock or setting of the MMCME2_ADV
• Signal integrity losses (ISI, board jitter, etc.)
• Internal device jitter
Process, voltage, and temperature characterization are performed to qualify the interface. The
allowable overall jitter tolerance is 0.375 UI.
Reference The design is highly hierarchical. This provides great flexibility and allows easy reuse of design
Design modules. The top structure of the design is shown in Figure 12.
X523_12_012012
Figure 13 and Figure 14 show the directory structure of the Common and SgmiiReceiver
folders, respectively.
X23_13_012012
X523_14_012012
SgmiiReceiver is the only part of the design needed for custom implementation. This is also
the part discussed in this application note.
Conclusion Xilinx 7 series FPGAs can implement asynchronous communication using SelectIO interface
resources, allowing transceivers to be reserved for other uses. This implementation also helps
to reduce cost by making smaller FPGA selection possible.
Revision The following table shows the revision history for this document.
History
Date Version Description of Revisions
04/06/12 1.0 Initial Xilinx Release.
05/17/17 1.1 Added Zynq-7000 SoC to document title and Summary. Removed “GT”
before “transceivers” in Introduction and Conclusion. Updated
OSERDES and ISERDES connections in Figure 2. Updated OSERDES
connection in Figure 10. In Figure 11, updated BUFIO rate to 625 MHz.
Notice of The information disclosed to you hereunder (the “Materials”) is provided solely for the selection and use of
Xilinx products. To the maximum extent permitted by applicable law: (1) Materials are made available "AS
Disclaimer IS" and with all faults, Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS,
IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2)
Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of
liability) for any loss or damage of any kind or nature related to, arising under, or in connection with, the
Materials (including your use of the Materials), including for any direct, indirect, special, incidental, or
consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage
suffered as a result of any action brought by a third party) even if such damage or loss was reasonably
foreseeable or Xilinx had been advised of the possibility of the same. Xilinx assumes no obligation to
correct any errors contained in the Materials or to notify you of updates to the Materials or to product
specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior
written consent. Certain products are subject to the terms and conditions of Xilinx’s limited warranty,
please refer to Xilinx’s Terms of Sale which can be viewed at https://www.xilinx.com/legal.htm#tos; IP
cores may be subject to warranty and support terms contained in a license issued to you by Xilinx. Xilinx
products are not designed or intended to be fail-safe or for use in any application requiring fail-safe
performance; you assume sole risk and liability for use of Xilinx products in such critical applications,
please refer to Xilinx’s Terms of Sale which can be viewed at https://www.xilinx.com/legal.htm#tos.