0% found this document useful (0 votes)
12 views124 pages

NB LTE Downlink Transmitter - Thesis - 2022

Uploaded by

Zoro Zoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views124 pages

NB LTE Downlink Transmitter - Thesis - 2022

Uploaded by

Zoro Zoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

Cairo University

Faculty of Engineering
Electronics and Electrical
Communications Department

DESIGN AND IMPLEMENTATION OF THE


PHYSICAL LAYER OF NARROW-BAND IOT
LTE DOWNLINK LOW POWER DIGITAL
TRANSMITTER

A Thesis submitted in partial


fulfillment of the requirements for
the Degree of Bachelor of Science

Ahmed Ragab Ahmed

Hassan Mohamed Elbaroudy

Khaled Mahmoud Ibrahim

Khaled Mostafa Abdelhakim

Salma Mohamed Sultan

Under the supervision of

Dr. Hassan Mostafa

Si-Vision

July, 2022
Acknowledgements

Words cannot express our gratitude to our professor Dr. Hassan Mostafa for his invaluable
patience and feedback. We also could not have undertaken this journey without his support, who
generously provided knowledge and expertise.

We are also grateful to our advisors from Si-Vision, Eng. Ramy Raafat, Eng. Khaled Ismail,
Eng. Mahmoud Samy, Eng. Mohamed Maher, Eng. Mohamed Talha, and Eng. Mahmoud
Tantawi for their time, effort, regular feedback sessions, and moral support.

Finally, we wouldn’t forget to mention our families, especially our parents, brothers, and sisters.
Their belief in us has kept our spirits and motivation high during this process.

II
Abstract

LTE cellular technology is now globally widespread. One of the most interesting applications
that this technology offers concerns the "Internet of Things", better known as IoT which gave
birth to several new technologies such as the Narrow-Band IoT (NB-IoT) which is a new cellular
technology introduced by the 3GPP Release-13. This technology provides a communication
standard for wide areas and its main feature is low power consumption.

In this thesis, the digital design and the FPGA implementation of the Narrowband Physical
Downlink Shared Channel (NPDSCH) for a low-power OFDM transmitter for NB-IoT
applications is proposed. The thesis shall go through all project’s steps starting from theoretical
concepts & extracting the specifications from the 3GPP Release-14 standard, going through the
detailed design of each block separately using MATLAB modeling then RTL implementation
and use the MATLAB functions to verify the RTL design, ending up with system integration &
ASIC synthesis. The whole developed system has been implemented on a Vertex-7 FPGA.

III
Table of Contents
Acknowledgements .................................................................................................................................... II
Abstract..................................................................................................................................................... III
Table of Contents ......................................................................................................................................IV
List of Figures........................................................................................................................................ VIII
List of Tables .............................................................................................................................................XI
List of Abbreviations .............................................................................................................................. XII
Chapter One ................................................................................................................................................ 1
1. Introduction ......................................................................................................................................... 1
1.1. Motivation : ................................................................................................................................. 1
1.2. OFDM overview : ........................................................................................................................ 2
1.3. Modes of operation : ................................................................................................................... 6
1.4. Frame Structure .......................................................................................................................... 6
1.5. Physical Resource Block Structure for NB-IoT ....................................................................... 8
1.6. Problem Description ................................................................................................................... 9
1.7. Thesis Organization .................................................................................................................... 9
Chapter Two .............................................................................................................................................. 10
2. Physical Downlink Transmitter Chain and sub-blocks’ function ................................................ 10
2.1. Transmitter Chain .................................................................................................................... 10
2.2. Sub-blocks’ function ................................................................................................................. 11
2.2.1. Cyclic Redundancy Check (CRC) ....................................................................................... 11
2.2.2. Encoder .................................................................................................................................. 11
2.2.3. Rate Matcher ......................................................................................................................... 12
2.2.4. Scrambler............................................................................................................................... 12
2.2.5. Modulation Mapper .............................................................................................................. 13
2.2.6. Resource Element Mapper ................................................................................................... 13
2.2.7. IFFT ....................................................................................................................................... 14
Chapter Three ........................................................................................................................................... 17
3. Standard Specifications and Assumptions ...................................................................................... 17
3.1. CRC ............................................................................................................................................ 17
3.2. Encoder ...................................................................................................................................... 18
3.3. Rate Matcher ............................................................................................................................. 19
3.4. Scrambler................................................................................................................................... 21
3.5. Modulation Mapper .................................................................................................................. 22
3.6. Resource Element Mapper ....................................................................................................... 23

IV
3.6.1. NRS Generation .................................................................................................................... 23
3.6.2. Narrowband Reference Signals Location (NRS index) ..................................................... 24
3.6.3. Narrowband Primary Synchronization Signals (NPSS) .................................................... 24
3.6.4. Data Rate Calculation for NPDSCH ................................................................................... 25
3.7. IFFT & CP ................................................................................................................................. 26
Chapter Four ............................................................................................................................................. 27
4. Design Implementation ..................................................................................................................... 27
4.1. CRC ............................................................................................................................................ 27
4.1.1. Block Diagram ....................................................................................................................... 27
4.1.2. Interface Table ...................................................................................................................... 27
4.1.3. Operation ............................................................................................................................... 28
4.1.4. FSM ........................................................................................................................................ 28
4.1.5. Simulation results.................................................................................................................. 29
4.1.6. Synthesis Results ................................................................................................................... 30
4.1.7. Latency ................................................................................................................................... 31
4.2. Encoder ...................................................................................................................................... 32
4.2.1. Block Diagram ....................................................................................................................... 32
4.2.2. Interface Table ...................................................................................................................... 32
4.2.3. FSM ........................................................................................................................................ 33
4.2.4. Operation ............................................................................................................................... 34
4.2.5. Simulation results.................................................................................................................. 35
4.2.6. Synthesis Results ................................................................................................................... 37
4.2.7. Latency ................................................................................................................................... 38
4.3. Rate Matcher ............................................................................................................................. 39
4.3.1. Block Diagram ....................................................................................................................... 39
4.3.2. Interface Table ...................................................................................................................... 40
4.3.3. General Steps of Operation .................................................................................................. 40
4.3.4. First design ............................................................................................................................ 41
4.3.4.1. Operation ........................................................................................................................... 41
4.3.4.2. Disadvantages of the design ............................................................................................. 43
4.3.5. Second design ........................................................................................................................ 43
4.3.5.1. Idea of optimization .......................................................................................................... 43
4.3.5.2. Operation ........................................................................................................................... 44
4.3.5.3. Enhancements and improvements in the second design: - ............................................ 47
4.3.6. Simulation results.................................................................................................................. 47

V
4.3.7. Synthesis Results ................................................................................................................... 51
4.4. Scrambler................................................................................................................................... 53
4.4.1. Block Diagram ............................................................................................................................. 53
4.4.2. Interface Table ............................................................................................................................ 53
4.4.3. Operation ..................................................................................................................................... 54
4.4.4. Challenges .............................................................................................................................. 55
4.4.5. Future work:.......................................................................................................................... 55
4.4.6. Simulation results.................................................................................................................. 56
4.4.7. Synthesis Results ................................................................................................................... 57
4.5. Modulation Mapper .................................................................................................................. 59
4.5.1. Block Diagram ............................................................................................................................. 59
4.5.2. Interface Table ............................................................................................................................ 59
4.5.3. Operation ..................................................................................................................................... 60
4.5.4. Simulation results.................................................................................................................. 61
4.5.5. Synthesis results .................................................................................................................... 61
4.6. Resource Element Mapper ....................................................................................................... 63
4.6.1. NRS Generation .................................................................................................................... 63
4.6.1.1. Block Diagram ................................................................................................................... 63
4.6.1.2. Interface Table .................................................................................................................. 63
4.6.1.3. Operation ........................................................................................................................... 64
4.6.1.4. FSM .................................................................................................................................... 65
4.6.2. NRS Location ........................................................................................................................ 67
4.6.2.1. Block Diagram ................................................................................................................... 67
4.6.2.2. Interface Table .................................................................................................................. 67
4.6.2.3. Operation ........................................................................................................................... 67
4.6.3. NPSS ....................................................................................................................................... 68
4.6.3.1. Block Diagram ................................................................................................................... 68
4.6.3.2. Interface Table .................................................................................................................. 68
4.6.3.3. Operation ........................................................................................................................... 68
4.6.4. Element Mapper.................................................................................................................... 69
4.6.4.1. Interface Table .................................................................................................................. 69
4.6.4.2. Operation ........................................................................................................................... 70
4.6.5. Resource Element Mapper TOP LEVEL ........................................................................... 72
4.6.5.1. Block Interface .................................................................................................................. 72
4.6.5.2. Architecture ....................................................................................................................... 72

VI
4.6.5.3. Interface Table .................................................................................................................. 73
4.6.5.4. Complete Frame Scenario ................................................................................................ 74
4.6.6. Simulation results of NRS Generation ................................................................................ 75
4.6.7. Simulation results of NRS Index.......................................................................................... 76
4.6.8. Simulation results of the TOP module ................................................................................ 77
4.6.9. Synthesis Results ................................................................................................................... 78
4.7. IFFT & CP ................................................................................................................................. 79
4.7.1. Block Interface ...................................................................................................................... 83
4.7.2. Interfaces Table ..................................................................................................................... 84
Architecture ........................................................................................................................................... 84
4.7.3. Blocks Description................................................................................................................. 85
4.7.4. Operation ............................................................................................................................... 85
4.7.5. Clock Determination ............................................................................................................. 87
4.7.6. Future Work .......................................................................................................................... 87
4.7.7. Simulation Results................................................................................................................. 88
4.7.8. Fraction bits number determination ................................................................................... 93
4.7.9. Synthesis Results ................................................................................................................... 95
Chapter Five .............................................................................................................................................. 97
5. System Integration ............................................................................................................................ 97
5.1. NB-IoT LTE Transmitter TOP LEVEL ................................................................................. 98
5.2. Top Level Interface Table ........................................................................................................ 98
5.3. Detailed NB-IoT LTE Transmitter Chain .............................................................................. 99
5.4. System’s Finite State Machine ............................................................................................... 100
5.5. Simulation Results .................................................................................................................. 101
5.6. System Latency........................................................................................................................ 104
5.7. Synthesis Results ..................................................................................................................... 105
5.7.1. Whole Chain ASIC Synthesis Results ............................................................................... 105
5.7.2. Summary for Power and Area results ............................................................................... 107
5.8. Full System Vivado Utilization for Vertex-7 ........................................................................ 108
Chapter Six .............................................................................................................................................. 109
6. FPGA Implementation ................................................................................................................... 109
Conclusion ............................................................................................................................................... 111
References ................................................................................................................................................ 112

VII
List of Figures
Figure 1: NB-IoT Applications .................................................................................................................. 1
Figure 2: Multi-path fading ....................................................................................................................... 2
Figure 3: ISI caused by delay spread and guard periods inserting ........................................................ 2
Figure 4: Frequency Selective Fading ....................................................................................................... 2
Figure 5: Inter-channel Interference......................................................................................................... 3
Figure 6: Symbol duration in time domain .............................................................................................. 3
Figure 7: Dividing the carrier into sub-carriers of lesser frequencies ................................................... 3
Figure 8: FDMA .......................................................................................................................................... 4
Figure 9: Orthogonal Frequency Multiple Access vs FDMA.................................................................. 4
Figure 10: OFDMA for downlink vs SC-FDMA for uplink.................................................................... 5
Figure 11: NB-IoT deployment scenarios ................................................................................................. 6
Figure 12: FDD vs TDD .............................................................................................................................. 7
Figure 13: Radio frame structure type 1 .................................................................................................. 8
Figure 14: Resource grid with 15 kHz spacing......................................................................................... 8
Figure 15: Transmitter Chain Block Diagram ....................................................................................... 10
Figure 16: The difference between tail-biting convolutional encoder and convolutional encoder .... 11
Figure 17: Physical Resource Block ........................................................................................................ 13
Figure 18: Multi-carrier System .............................................................................................................. 14
Figure 19: Simple Analog OFDM System Implementation .................................................................. 15
Figure 20: Simplified OFDM System Block Diagram ........................................................................... 15
Figure 21: Symbols Mapping by IFFT.................................................................................................... 16
Figure 22: Cyclic Prefix ............................................................................................................................ 16
Figure 23: Parity bits added to input bits ............................................................................................... 17
Figure 24: CRC Linear Feedback Shift Registers ................................................................................. 17
Figure 25: XORing of encoder's output streams .................................................................................... 18
Figure 26: Rate Matcher stages ............................................................................................................... 19
Figure 27: inter-column permutation pattern for sub-block interleaver............................................. 20
Figure 28: Gold sequence generation ...................................................................................................... 22
Figure 29: QPSK modulation mapping .................................................................................................. 22
Figure 30: QPSK Constellation ............................................................................................................... 22
Figure 31: Sub-frame (two slots) of the radio frame ............................................................................. 23
Figure 32: Mapping of NPSS in the frame ............................................................................................. 24
Figure 33: CRC top level .......................................................................................................................... 27
Figure 34: CRC Architecture................................................................................................................... 27
Figure 35: Generated & built-in MATLAB functions comparison of CRC ........................................ 29
Figure 36: Correct CRC bits in RTL as MATLAB ............................................................................... 29
Figure 37: CRC Power report.................................................................................................................. 30
Figure 38: CRC Area report .................................................................................................................... 30
Figure 39: Timing_max (setup time) report of CRC ............................................................................. 31
Figure 40: Tail-bit convolutional encoder top level ............................................................................... 32
Figure 41: tail-bit conv. encoder architecture ........................................................................................ 33
Figure 42: Generated & built-in second encoder output stream .......................................................... 35
Figure 43: Generated & built-in first encoder output stream .............................................................. 35
Figure 44: Generated & built-in third encoder output stream ............................................................. 35
Figure 45: Encoder output streams and valid signal ............................................................................. 36
Figure 46: Comparing MATLAB and RTL results ............................................................................... 36

VIII
Figure 47: Encoder Area report .............................................................................................................. 37
Figure 48: Encoder power report ............................................................................................................ 37
Figure 49: Encoder Timing_max (setup time) report ............................................................................ 38
Figure 50: Rate Matcher main architecture ........................................................................................... 39
Figure 51: Illustrative architecture of first design ................................................................................. 41
Figure 52: Resource Allocation................................................................................................................ 42
Figure 53: Register File Implementation ................................................................................................ 42
Figure 54: The used Permutation look-up table..................................................................................... 42
Figure 55: Illustrative architecture of second design............................................................................. 43
Figure 56: Serial-to-parallel module ....................................................................................................... 44
Figure 57: Register files and bit selection module.................................................................................. 45
Figure 58: Rate Matcher State Diagram................................................................................................. 46
Figure 59: comparison between MATLAB built in and our code ........................................................ 47
Figure 60: first 24 bits in RTL ................................................................................................................. 48
Figure 61: first 24 bits in MATLAB ........................................................................................................ 48
Figure 62: first 20 bits in RTL ................................................................................................................. 48
Figure 63: first 20 bits in MATLAB ........................................................................................................ 48
Figure 64: RTL final simulation results .................................................................................................. 48
Figure 65: states transition from reset to input read ............................................................................. 49
Figure 66: Input stream reading.............................................................................................................. 49
Figure 67: Bit selection and out_control new value ............................................................................... 49
Figure 68: Output state ............................................................................................................................ 50
Figure 69: Last output data...................................................................................................................... 50
Figure 70: Rate Matcher Area report ..................................................................................................... 51
Figure 71: Rate Matcher Power report................................................................................................... 51
Figure 72: Rate Matcher Timing_max (setup time) report .................................................................. 52
Figure 73: Scrambler Block Diagram ..................................................................................................... 53
Figure 74: Scrambler architecture .......................................................................................................... 54
Figure 75: Scrambler FSM ...................................................................................................................... 54
Figure 76: MATLAB output for scrambler NcellID = 65535, nf = 1, ns = 0........................................ 56
Figure 77: RTL output for scrambler NcellID = 65535, nf = 1, ns = 0 ................................................. 56
Figure 78: Scrambler area report............................................................................................................ 57
Figure 79: Scrambler Timing_max (setup time) report ........................................................................ 57
Figure 80: Scrambler power report ........................................................................................................ 58
Figure 81: Modulation Mapper Block Diagram .................................................................................... 59
Figure 82: Modulation Mapper symbol generation ............................................................................... 60
Figure 83: Modulation Mapper RTL output .......................................................................................... 61
Figure 84: Modulation Mapper MATLAB output ................................................................................ 61
Figure 85: Modulation Mapper area report ........................................................................................... 61
Figure 86: Modulation Mapper power report ........................................................................................ 62
Figure 87: Modulation mapper Timing_max (setup time) report ........................................................ 62
Figure 88: Block Interface of NRS generation ....................................................................................... 63
Figure 89: Two slots of the frame ............................................................................................................ 64
Figure 90: The two LFSRs of the NRS values ........................................................................................ 64
Figure 91: The 4-bit registers & LFSRs configuration.......................................................................... 64
Figure 92: Pilots memory elements ......................................................................................................... 65
Figure 93: illustrative diagram of datapath & controller of NRS values ............................................ 65

IX
Figure 94: NRS_Generation FSM State Diagram.................................................................................. 66
Figure 95: Interface Diagram of NRS Index .......................................................................................... 67
Figure 96: Interface Diagram of NPSS ................................................................................................... 68
Figure 97: Block Interface of Resource Element Mapper ..................................................................... 72
Figure 98: Top Level of Resource Element Mapper .............................................................................. 72
Figure 99: MATLAB comparison of NRS values at cell_id=300 & subframe=3 ............................... 75
Figure 100: RTL results of NRS values at cell_id=300 & subframe=3 ................................................ 75
Figure 101: Testing a wholre frame pilots using automated testbench................................................ 76
Figure 102: RTL results of NRS indices at cell_id=300 ......................................................................... 76
Figure 103: Matlab results of NRS indices at cell_id=300 .................................................................... 76
Figure 104: Output of the REM is the QPSK symbols requested by the IFFT ................................... 77
Figure 105: Test cases for a complete frame using automated testbench ............................................ 77
Figure 106: Timing_max (setup time) report of REM .......................................................................... 78
Figure 107: Area report of REM ............................................................................................................. 78
Figure 108: Power report of REM........................................................................................................... 78
Figure 109: Decimation in Frequency ..................................................................................................... 79
Figure 110: Decimation in Time .............................................................................................................. 80
Figure 111: Memory-based Architecture ............................................................................................... 81
Figure 112: Signal Flow Graph for 16-point FFT .................................................................................. 82
Figure 113: Automation ........................................................................................................................... 83
Figure 114: IFFT & CP Block Diagram ................................................................................................. 83
Figure 115: IFFT & CP Architecture ..................................................................................................... 84
Figure 116: IFFT & CP time-line ............................................................................................................ 86
Figure 117: OFDM Symbol's 137 cycles ................................................................................................. 86
Figure 118: Generated Input (fixed) ....................................................................................................... 88
Figure 119: Generated Input (float) ........................................................................................................ 88
Figure 120: Generated Test-bench fragment ......................................................................................... 89
Figure 121: Expected Output (float) ....................................................................................................... 89
Figure 122: IFFT output to clarify the CP ............................................................................................. 90
Figure 123: CP duration ........................................................................................................................... 90
Figure 124: IFFT output to clarify the OFDM duration ....................................................................... 91
Figure 125: IFFT output to clarify the continuity of the output........................................................... 91
Figure 126: IFFT output for 20 OFDM symbols ................................................................................... 91
Figure 127: Final Comparison File.......................................................................................................... 92
Figure 128: Quantization error vs Fraction length ................................................................................ 94
Figure 129: Area report of IFFT & CP................................................................................................... 95
Figure 130: Timing_max (setup time) report of IFFT & CP ................................................................ 95
Figure 131: Power report of IFFT & CP ................................................................................................ 96
Figure 132: Illustrative Diagram of our integration approach ............................................................. 97
Figure 133: NB-IoT Tx Top Level ........................................................................................................... 98
Figure 134: Detailed Architecture ........................................................................................................... 99
Figure 135: Datapath & Controller Diagram....................................................................................... 100
Figure 136: Generated & Expected final outputFigure 137: Datapath & Controller Diagram ...... 100
Figure 138: Generated & Expected final output .................................................................................. 101
Figure 139: NPSS in final comparison file ............................................................................................ 102
Figure 140: Integration of the first half of the chain ........................................................................... 103
Figure 141: Handshaking between Resource Element Mapper and IFFT ........................................ 103

X
Figure 142: The frame time specified by the 3GPP ............................................................................. 104
Figure 143: Total Power for the integrated system.............................................................................. 105
Figure 144: Area report for the integrated system .............................................................................. 106
Figure 145: Timing_max (setup) report for the integrated system .................................................... 106
Figure 146: Power Results for all blocks .............................................................................................. 107
Figure 147: Area Results for all blocks ................................................................................................. 107
Figure 148: : Vivado Utilization for Vertex-7 for the integrated chain ............................................. 108
Figure 149: FPGA Architecture ............................................................................................................ 109
Figure 150: VIO ...................................................................................................................................... 110
Figure 151: ILA waveform (at the beginning)...................................................................................... 110
Figure 152: ILA waveform (at the end) ................................................................................................ 110

List of Tables

Table 1: Port description of CRC ............................................................................................................ 27


Table 2: State Table of CRC FSM........................................................................................................... 28
Table 3: Port description table of Tail-bit Conv. encoder ..................................................................... 32
Table 4: State Table of the encoder's FSM ............................................................................................. 34
Table 5: Rate Matcher Interface Table................................................................................................... 40
Table 6: Rate Matcher State Table ......................................................................................................... 46
Table 7: Interface table of scrambler ...................................................................................................... 53
Table 8: Modulator interface table.......................................................................................................... 59
Table 9: Interface Table of NRS generation ........................................................................................... 63
Table 10: State Table of NRS Generation FSM ..................................................................................... 66
Table 11: Interface Table of NRS Index ................................................................................................. 67
Table 12: Interface Table of NPSS .......................................................................................................... 68
Table 13: Interface Table of Element Mapper ....................................................................................... 70
Table 14: Interface Table of REM top level ........................................................................................... 73
Table 15: Comparison between Memory-based and SDF architectures ............................................. 80
Table 16: Block Interfaces........................................................................................................................ 84
Table 17: Top Level Interface Table ....................................................................................................... 98

XI
List of Abbreviations

OFDM Orthogonal Frequency Division Multiplexing


NB-IoT Narrowband Internet of Things
NPDSCH Narrowband Physical Downlink Shared Channel
ICI Inter-Carrier Interference
ISI Inter-Symbol Interference
3GPP 3rd Generation Partnership Project
IFFT Inverse Fast Fourier Transform
LTE Long Term Evolution
PAPR Peak to Average Power Ratio
SC-FDMA Single carrier frequency division multiple access
CP Cyclic Prefix
CRC Cyclic Redundancy Check
NPSS Narrowband Primary Synchronization Signal
NRS Narrowband Reference Signal
FDD Frequency division duplexing
TDD Time division duplex
LAA Licensed-Assisted Access
MSB Most Significant Bit
ASK Amplitude Shift Keying
FSK Frequency Shift Keying
PSK Phase Shift Keying
QPSK Quadrature Phase Shift Keying
𝑵𝒄𝒆𝒍𝒍
𝑰𝑫 Narrowband physical layer cell identity
RNTI Radio network temporary identifier
𝒏𝒇 System frame number
𝒏𝒔 Slot number within a radio frame
𝑻𝒇 Radio frame duration
∆𝒇 Subcarrier spacing
𝑻𝒔 Slot duration
𝒎𝒂𝒙,𝑫𝑳
𝑵𝑹𝑩 The maximum number of resource blocks in uplink.

XII
Chapter One

1. Introduction

1.1. Motivation :
Narrowband Internet of things “NB-IoT” is a low-power wide-area network “LPWAN” radio
technology standard developed by 3GPP for cellular devices and services. LTE was designed for high
speed communication and is not optimized for applications that need to support a potentially large
number of low-rate and low power devices.
These low cost devices, typically used in applications such as sensors, remote maintenance and tracking,
health-care, utilities etc, are expected to have very low complexity and low power consumption with a
very long battery life. With the advent of IoT devices, the total number of connected devices is expected
to reach 28 billion by 2023. Hence, it is desirable to develop LTE based wireless systems suitable for low
data-rate and low power IoT applications. In NB-IoT applications, typically small data volume amounts
need to be infrequently transmitted.
Realizing the importance of IoT for low-power and low cost applications with extended coverage and
very long battery life, 3GPP introduced NB-IoT, as a part of their LTE-Release-14, where NB-IoT is a
new mobile network, which is based on the LTE standard and is used exclusively for IoT applications.
Compared to mobile networks (2G, 3G and 4G), NB-IoT offers energy-saving capabilities that increase
the battery life of simple IoT applications up to 10 years.
NB-IOT technology in particular supports a range of data rates. It depends on the channel quality, or the
signal-to-noise ratio and the quantity of resources in certain areas (bandwidth). Also, each device has a
specific power budget, which leads to combine the power of several devices. There are many applications
on IOT like smart homes, wearables, traffic management, water distribution, smart grid, connected cars,
connected health, … etc.

Figure 1: NB-IoT Applications

1
1.2. OFDM overview :
There are several complexities associated with wireless channels :

 Multi-path Fading : Unlike a wired channel which uses a fixed path, the signals in a wireless
channel can reach a user using multiple paths. All these signals known as multipath
components may have different channel gain and time delay. This combined effect causes
what we know as multipath fading.

Figure 2: Multi-path fading

 Delay Spread : as a consequence of multipath propagation, the duration of a symbol gets


extended. This may interfere with the next symbol. This is called Inter-symbol interference
(ISI) or cross-talk. Guard periods are introduced to avoid cross-talk.

Figure 3: ISI caused by delay spread and guard periods inserting

 Frequency Selective Fading : Signals having bandwidth higher than the coherence bandwidth
of the channel faces variable attenuation at different frequencies, this ultimately distorts the
signal and giving rise to frequency selective fading. Complex channel equalization techniques
are employed to reduce it.

Figure 4: Frequency Selective Fading

2
 Inter-channel Interference : Often signal bandwidth of adjacent carrier frequencies overlap
with each other giving rise to inter-channel interference. Guard bands were introduced to
avoid inter-channel interference.

Figure 5: Inter-channel Interference

All these limitations compounded with the scarcity of bandwidth gave rise to multiple access technique
OFDMA. Before diving into OFDMA, let us understand multi-carrier wireless transmission system.
Suppose a signal to be transmitted over a bandwidth 𝐵, and carrier frequency 𝐹𝑐 . Then symbol time for
this would be 𝑇𝑠 = 1⁄𝐵 , for a single carrier wide-band channel of 1 MHz for example, the symbol time
will be 1 microsecond. Given that the delay spread of the channel is 2 microseconds, then the combined
symbol time would be 3 microseconds, which means delay spread occupies 66 % of the combined symbol
time, thus reducing the efficiency of the channel by 1/3.

Figure 6: Symbol duration in time domain

As delay spread is difficult to control, the effect of delay spread can be minimized by using multiple sub-
carriers of lesser bandwidth, so instead of having a single carrier of 1 MHz, we divide it into 100 sub-
carriers of 10 KHz, each having a symbol duration of 100 microseconds. So a delay spread of 2
microseconds will have a negligible effect over the channel efficiency.

Figure 7: Dividing the carrier into sub-carriers of lesser frequencies

3
This concept is used in FDMA which uses slowly modulating sub-carriers of higher symbol duration. As
these sub-carriers are modulated with data, they gain bandwidth centered around the sub-carriers
frequencies. Guard bands are used to separate them in frequency domain.

Figure 8: FDMA

We can represent this transmitted signal in the equation form as follows :


𝑁

𝑇𝑟𝑎𝑛𝑠𝑚𝑖𝑡𝑡𝑒𝑑 𝑠𝑖𝑔𝑛𝑎𝑙 = ∑ 𝑋𝑛 𝑒 𝑗𝑤𝑡𝑛


𝑛=0

Where the summation of individual symbol multiplied with different carrier frequencies and transmitted
at radio frequencies. OFDMA or Orthogonal Frequency Division Multiple Access is a special case of
FDMA, where users are provided a set of sub-carriers overlapping in frequency domain.
However, these sub-carriers are specially designed to be orthogonal to each other and are placed in a
manner that all other sub-carriers have a zero component at the peak of one sub-carrier which allows them
to occupy the same bandwidth without any interference. This in turn negates the use of guard bands. As a
result, the sub-carriers can be closely packed to increase channel efficiency.

Figure 9: Orthogonal Frequency Multiple Access vs FDMA

4
 For the LTE uplink, a different concept is used for the access technique. Although still using a form of
OFDMA technology, the implementation is called Single Carrier Frequency Division Multiple Access
(SC-FDMA). One of the key parameters that affects all mobiles is that of battery life. Even though
battery performance is improving all the time, it is still necessary to ensure that the mobiles use as little
battery power as possible. With the RF power amplifier that transmits the radio frequency signal via
the antenna to the base station being the highest power item within the mobile, it is necessary that it
operates in as efficient mode as possible. This can be significantly affected by the form of radio
frequency modulation and signal format.

 Signals that have a high peak to average ratio and require linear amplification do not lend themselves
to the use of efficient RF power amplifiers. As a result it is necessary to employ a mode of transmission
that has as near a constant power level when operating. Unfortunately OFDM has a high peak to average
ratio (PAPR) which is proportional to the square of the number of carriers involved.

 In OFDMA, we have one-to-one mapping between symbol and sub-carrier, but SC-FDMA allows a
symbol to be transmitted in parts over multiple sub-carriers. For example, in OFDMA, one symbol
occupies one subcarrier of 15 kHz, but in SC-FDMA, the same symbol is distributed among multiple
sub-carriers of 15 kHz. In short, SC-FDMA behaves like a single carrier system with short symbol
duration compared to OFDMA. SC-FDMA reduces PAPR by reducing the number of carriers.

Figure 10: OFDMA for downlink vs SC-FDMA for uplink

5
1.3. Modes of operation :
NB-IoT is designed to support three different deployment scenarios :
 Stand alone : NB-IoT carrier is deployed independently of any LTE carrier and it can
work as a replacement to GSM carriers.
 Guard band : Occupying a 180 kHz wide physical resource block in the guard band of
existing LTE carrier band and it does not take any capacity from the main LTE traffic
carrier.
 In-band : Occupying one physical resource block (PRB) within the LTE carrier
bandwidth. This mode is the most efficient one as it allows the base station schedule to
multiplex LTE and NB-IoT traffic in the same spectrum.

Figure 11: NB-IoT deployment scenarios

1.4. Frame Structure


According to 3GPP there are three radio frame structures designed for LTE:

 The radio frame structure type 1:


Is only applicable to FDD (for both full duplex and half duplex operation) and has a duration of 10ms
and consists of 20 slots with a slot duration of 0.5ms. Two adjacent slots form one sub-frame of
length 1ms, except when the sub-carrier bandwidth is 1.25 kHz, in which case one slot forms one sub-
frame.

 The radio frame structure type 2:


Is only applicable to TDD and consists of two half-frames with a duration of 5ms each and containing
each either 10 slots of length 0.5ms, or 8 slots of length 0.5ms and three special fields (DwPTS, GP
and UpPTS) which have configurable individual lengths and a total length of 1ms.

 The radio frame structure type 3:


Is only applicable to LAA secondary cell operation. It has a duration of 10ms and consists of 20 slots
with a slot duration of 0.5ms. Two adjacent slots form one sub-frame of length 1ms. Any sub-frame
may be available for downlink or uplink transmission.

6
Frequency Division Duplexing (FDD) is a method for establishing a full-duplex communications link
that uses two different radio carrier frequencies for transmitting (uplink) and receiving (downlink). Both
channels are separated by a defined offset frequency, which is also known as “guard band” which stops
the interference between uplink channel and downlink channel
Time Division Duplexing (TDD) is a technique by which the uplink & downlink transmissions are
carried over the same frequency by using synchronized time intervals,
 Factors to decide whether to use FDD or TDD :
 Latency : FDD offers lower latency than TDD, since transmit and receive functions
operate simultaneously and continuously, whereas in TDD, switching from transmit to
receive incurs a delay that causes traditional TDD systems to have greater inherent
latency.
 Equipment cost : No diplexer is needed to isolate the transmitter and receiver in TDD,
whereas in FDD, a diplexer is needed.
 Distance preferability : As the distance increases, the guard period also increases in
TDD because the point-to-point signal propagation time increases. The increased guard
period significantly affects the efficiency, whereas FDD doesn’t have a problem with
small or large distances.
 Unbalanced Traffic : In real-life network, the volume consumed in downlink is much
higher than in uplink. In TDD, it is possible to dynamically adjust the capacity by
utilizing more timeslots for downlink than uplink, whereas in FDD, the capacity is
normally balanced in both directions.
 MAC layer complexity : MAC layer in TDD system is complex compared to FDD
system, as it has to deal with accurate time synchronization between the systems.

As far as NB-IoT-LTE is concerned, it is defined to support both the paired & unpaired spectrum
for FDD (Radio frame structure Type 1 which is applicable to FDD only).

Figure 12: FDD vs TDD

7
For Narrow Band Downlink Physical Shared Channel, frame type 1 is used.

Figure 13: Radio frame structure type 1

According to the 3GPP


𝑇𝑓𝑟𝑎𝑚𝑒 = 307200 𝑇𝑠 = 10 𝑚𝑠

𝑇𝑠𝑙𝑜𝑡 = 15360 𝑇𝑠 = 0.5 𝑚𝑠


𝑇𝑠𝑢𝑏−𝑓𝑟𝑎𝑚𝑒 = 30720 𝑇𝑠 = 1 𝑚𝑠

1.5. Physical Resource Block Structure for NB-IoT

 Resource block is also known as slot and its transmission time is


about 0.5ms.
 Twelve consecutive subcarriers in the frequency domain and seven
symbols in the time domain form each Resource Block.
 Two slots are combining together forming subframe and ten
subframes are combining together forming a whole frame and it
takes 10ms to transmit the whole frame where each subframe needs
1ms to be transmitted.
 The length of slot differs depends on the CP, in normal CP the slot
contains 7 OFDM symbols and the time taken by the first symbol
is greater than the rest, while in ended CP the slot contains 6
OFDM symbols and the slot’s time is divided equally between the
symbols.

 For NB-IoT, Normal Cyclic Prefix is used and the number of Figure 14: Resource grid with 15 kHz spacing
Resource Blocks used is 1 Resource Block for the NB-IoT with 12
frequencies sub-carriers and 14 symbols for each sub-frame with total BW=180 KHz.

8
1.6. Problem Description

 Realizing the importance of IoT for low-power and low cost applications with extended coverage and
very long battery life, 3GPP introduced narrowband IoT (NB-IoT), as a part of their LTE-Release-14.
Although NB-IoT is based on LTE, but it’s a new radio-access technology as it is not fully backward
compatible with the existing LTE devices. However, it can be easily integrated in the existing LTE
network by allocating some of the time and frequency resources to NB-IoT. NB-IoT occupies 180
kHz of spectrum, which is substantially smaller than LTE bandwidths of 1.4–20 MHz
 The target of this project is to implement low power “Downlink Transmitter” for the Narrow Band
Long Term Evolution (NB-LTE). The main goal is to design the Physical Layer chain of the LTE
Rel.14 that targets the NB-LTE.

 The project has gone through the following phases:


 Standard and literature reading
 Block level Modeling using MATLAB
 RTL Design
 System Integration
 ASIC Synthesis using Synopsys Design Compiler
 Prototyping with FPGA
 Documentation

1.7. Thesis Organization

Chapter One : We proposed an introduction and brief overview about NB-LTE and IoT applications,
then we showed up the complexities associated with wireless transmission, after that we summarized the
NB-LTE frame structure & resource grid.
Chapter Two : We’ll show the NPDSCH LTE Transmitter chain main architecture and a brief definition
for each block.
Chapter Three : We’ll introduce the standard specifications and assumptions according to the 3GPP
Release-14.
Chapter Four : We’ll show the detailed design for each block starting from block diagram then design
steps ending up with the operation clarification followed by simulation results for both MATLAB and
RTL and ASIC synthesis results on DC compiler of timing, area, and power reports
Chapter Five : The system integration is discussed in this chapter with its simulation & synthesis results.
Chapter Six : FPGA prototyping on Vertex-7 and the conclusion are discussed.

9
Chapter Two

2. Physical Downlink Transmitter Chain and sub-blocks’ function

2.1. Transmitter Chain

The Narrow Band IoT LTE Transmitter chain is composed of the shown blocks, in addition it has register
file which supplies the chain with the upper layer parameters, also a controller which feeds and control
the whole chain and that will be discussed in chapter five.

Figure 15: Transmitter Chain Block Diagram

10
2.2. Sub-blocks’ function

2.2.1. Cyclic Redundancy Check (CRC)


We use Cyclic redundancy check as it is easy in binary domain, easy to analyze its mathematics and the
hardware block that generate it consumes low power as it is consists of LFSR with constant length so it is
useful to detect the famous errors. Error detection at the receiving end is achieved by generation the parity
bits from the received information by dividing the data row by the same polynomial equation, if the
reminder is zero so there is no error otherwise there is a presence of transmission errors in the received
frame.

2.2.2. Encoder
In Communication systems, the most important thing is the BER and the efficiency of reliability of the
sending data, as we send digital bits between source and destination through the channel which suffers
from noise and interference, so we need to use channel coding techniques to overcome this problem and
ensure that the data will arrive well, the encoding technique is to add redundancy bits to the data so there
is a decrease in data rate, thus the price paid is to increase the bandwidth to send with the same data rate.
Convolutional Encoding is one of the most powerful techniques for Forward Error correction (FEC), there
are many algorithms for decoding but the most powerful one is Viterbi algorithm in Viterbi Decoding.
As encoder starts and terminates in the same state, the conventionally terminated convolutional codes in
which the LFSR is initialized with zero so the initial state is all zeros so zero tail bits are appended to the
data to drive the encoder back to the all-zero state, In LTE systems, for example, the convolutional
encoder has a constraint length of 7 which means that 6 tail bits are needed, so we append six zeros to the
data which results in 18 bits of overhead for a code rate of 1/3 as shown in figure 16.

Figure 16: The difference between tail-biting convolutional encoder and convolutional encoder

11
In tail-biting encoding, the encoder is initialized by the last bits of the input stream, this is different in
conventional convolutional coding. The tail bits introduce overhead in the system and causes the
terminated convolutional code to introduce some rate loss which results in 18 bits of overhead for a code
rate of 1/3. This represents a large overhead especially if the data blocks to be encoded are short (which is
the case here since the convolutional encoder is used to encode narrow band IoT). Also, tail-biting
provides an equal error protection for all bits compared to the conventional approach where the first few
input bits to the encoder are more protected since only a subset of transitions are allowed for the first few
bits, so we use Tail biting convolution encoder.
Advantages:
 This type of coding saving carrier wave resources especially in OFDM system.
 Overcomes the problem of the added cost of transmitting extra termination bits experienced by
trellis terminations.
 It avoids the rate loss.

Disadvantages:
 Decoding latency is increased due to the fact that initial and final states are required to correctly
start tracing back.
 Receiver complexity is increased.

2.2.3. Rate Matcher

Rate matcher is one of the important blocks in our NPDSCH transmitter chain, it exists at the middle of
the chain to match the number of bits in transport block to the number of bits that can be transmitted to a
given allocation in the resource block.
It also controls the rate of the chain, since the tail bit encoder rate is 1/3 and by using this block, we can
increase the rate or decrease it according to DCI and channel quality information.
This block is also used for interleaving which is very important in communication system based on
encoding techniques to decrease the bit error rate.

2.2.4. Scrambler

Scrambler is an important block in communication systems which used to randomize the input stream of
data and generate an output stream of data with the same length of the input. It is used to eliminate long
run of zeros or ones to avoid both idle and reset cases and helps in clock recovery and having no DC
component. Also, changing in the scrambling codes helps in security.

12
2.2.5. Modulation Mapper

Modulation is a process in which the incoming data stream is modulated onto a carrier, the modulation
process involves switching the amplitude, frequency, or phase of sinusoidal carrier in some fashion in
accordance with the incoming data, there are three basic modulation schemes known as ASK, FSK and
PSK. In NPDSCH, we use QPSK modulation scheme.

2.2.6. Resource Element Mapper

 The resource element mapper maps :


 Incoming QPSK symbols from modulator (data)
 Narrowband Reference Signals (NRS)
 Narrowband Primary Synchronization Signals
(NPSS)
 Resource element mapper's main function is to take the
output (Complex-Valued) symbols of the modulation
mapper & NRS & NPSS and assign them to the proper
subcarriers allocated for the NB-IoT bandwidth.

 NRS stands for Narrow-band Reference Signals that acts


as pilots, which help in channel estimation process to get
an estimate for the channel and equalize the effect of this
channel. These pilots found inside each Sub-frame except
two Sub-frames NPSS and NSSS Sub-frame.

 NRS is transmitted with data symbols at the transmitter


and is used for estimating the channel performance in the
downlink NB-IoT system at the receiver. NRS symbols
are inserted into the specifically assigned subcarriers at
every NB-IoT downlink slot. On each content of the
resource grid which consists of NRS, an inverse discrete
Fourier transform (IDFT) process is executed for
converting it into a time-domain reference sequence with
the addition of CP.
Figure 17: Physical Resource Block
 The NPSS helps UE to synchronize downlink to an NB-IoT cell.
These are transmitted in certain slots based on an 80-ms repetition interval. NPSS is designed to
allow devices to use a unified synchronization algorithm during initial acquisition without knowing
the NB-IoT deployment mode (In-band, Guard-band or stand-alone). This is achieved by avoiding
collision with REs used by LTE as much as possible.

 According to 3GPP, there are specific equations to get out the values of these pilots and other
equations to get the Location of these NRS signals and synchronization signals.

13
2.2.7. IFFT

OFDM (Orthogonal Frequency Division Multiplexing) is a multicarrier modulation technique that uses 'n'
number of orthogonal sub-carriers to become easier to separate out the information contained in each of
these sub-carriers if they're orthogonal, at the receiver.
They're called sub-carriers because of the fact that a given bandwidth (B) is split into 'n' number of
frequencies and each of these frequencies act as individual carriers.
Multiplexing is made possible by this technique. In fact, this technique converts a frequency selective
fading channel into 'n' individual number of narrow band flat fading channels (Frequency Flat Fading
channel is one whose signal bandwidth is less than the coherence bandwidth (bandwidth over which the
channel’s frequency response essentially remains constant) as in figure 18. No distortion occurs in flat
fading channel. However, frequency selective fading channel is one whose signal bandwidth is much
greater than the coherence bandwidth and hence different frequency components get attenuated/distorted
by different amounts, hence aptly called frequency selective fading).

Figure 18: Multi-carrier System

Since 'n' number of sub-carriers are involved, it would necessitate the requirement of 'n' number of
modulators (oscillators) as in figure 19. This would in-turn, increase the cost of design and the device
size. Hence, alternatives were looked upon.
It was found that if complex exponential signals were used as carriers, the function of 'n' modulators
could be replaced completely by a single IFFT operation at the transmitter(By mathematical
manipulation).It's easier to design a single IFFT DSP chip than to design 'n' number of modulators.
Hence, IFFT is used in OFDM transmitters.

14
Figure 19: Simple Analog OFDM System Implementation

The concepts used in the simple analog OFDM implementation can be extended to the digital domain
by using a combination of Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT)
digital signal processing. These transforms are important from the OFDM perspective because they
can be viewed as mapping digitally modulated input data (data symbols) onto orthogonal subcarriers,
as shown in figure 21.

In OFDM system, the input bits are grouped and mapped to source data symbols that are a complex
number representing the modulation constellation point (e.g. QPSK). These complex source symbols
are treated by the transmitter as though they are in the frequency-domain and are the inputs to an IFFT
block that transforms the data into the time-domain. The IFFT takes in N source symbols at a time
where N is the number of subcarriers in the system. Each of these N input symbols has a symbol
period of T seconds. Recall that the output of the IFFT is N orthogonal sinusoids. These orthogonal
sinusoids each have a different frequency and the lowest frequency is DC.

Figure 20: Simplified OFDM System Block Diagram

15
Figure 21: Symbols Mapping by IFFT

Cyclic Prefix (CP)


One of the primary reasons for using OFDM as a modulation format within LTE (and many other
wireless systems for that matter) is its resilience to multipath delays and spread. However it is still
necessary to implement methods of adding resilience to the system. This helps overcome the inter-symbol
interference (ISI) that results from this.
In areas where inter-symbol interference is expected, it can be avoided by inserting a guard period into the
timing at the beginning of each data symbol. It is then possible to copy a section from the end of the
symbol to the beginning. This is known as the cyclic prefix, CP. The receiver can then sample the
waveform at the optimum time and avoid any inter-symbol interference caused by reflections that are
delayed by times up to the length of the cyclic prefix, CP.
The length of the cyclic prefix, CP is important. If it is not long enough then it will not counteract the
multipath reflection delay spread. If it is too long, then it will reduce the data throughput capacity. For
LTE, the standard length of the cyclic prefix has been chosen to be 4.69 µs. This enables the system to
accommodate path variations of up to 1.4 km. With the symbol length in LTE set to 66.7 µs.
The symbol length is defined by the fact that for OFDM systems the symbol length is equal to the
reciprocal of the carrier spacing so that orthogonality is achieved. With a carrier spacing of 15 kHz, this
gives the symbol length of 66.7 µs

Figure 22: Cyclic Prefix

16
Chapter Three

3. Standard Specifications and Assumptions

3.1. CRC
According to Release 14, Denote the input bits to the CRC computation by a0 , a1 , a 2 , a3 ,..., a A1 , and the
parity bits by p0 , p1 , p 2 , p3 ,..., p L1 ,then we attached the CRC bits to the data bits b0 , b1 , b2 , b3 ,..., bB1
,where A is the size of the input sequence and L is the number of parity bits and B = A+ L.

Figure 23: Parity bits added to input bits

The parity bits are generated by one of the following cyclic generator polynomials:
- gCRC24A(D) = [D24 + D23 + D18 + D17 + D14 + D11 + D10 + D7 + D6 + D5 + D4 + D3 + D + 1], which is
added after each code block
- gCRC24B(D) = [D24 + D23 + D6 + D5 + D + 1], which is added only when code block segmentation is
applied
- gCRC16(D) = [D16 + D12 + D5 + 1], which is added to the Master Information Block
- gCRC8(D) = [D8 + D7 + D4 + D3 + D + 1], which is added to the control signal to transmit Channel
Quality Indicator (CQI) information to the uplink channels like (PUCCH and PUSCH) by the
Downlink Control Information (DCI) and (MIB).
Since we are adopting NB-IOT physical downlink shared channel, so max TBS = 2536 and the max value
before using segmentation is Z = 6144, so we will not use segmentation so we choose gCRC24A(D)
gCRC24A(D) = [D24 + D23 + D18 + D17 + D14 + D11 + D10 + D7 + D6 + D5 + D4 + D3 + D + 1]
The design is consisting of 24 LFSR that describes the above generator polynomial.

Figure 24: CRC Linear Feedback Shift Registers

17
3.2. Encoder
The tail-biting convolutional encoding is consisting of an LFSR and the initialization comes from the last
6 bits of data, at each clock cycle one bit enter the block and three bits comes out from the block, so this
means that the rate = 1/3, the XOR of the output streams is as shown in Figure 25.

Figure 25: XORing of encoder's output streams

The encoder output streams d k(0) , d k(1) and d k( 2) correspond to the first, second and third parity streams,
respectively as shown in Figure 25.

The polynomial of the output and its weights:


𝐺0 = [1 0 1 1 0 1 1],
𝐺0 = 1 + 2 + 8 + 16 = 133(𝑜𝑐𝑡𝑎𝑙)
𝐺1 = [1 1 1 1 0 0 1],
𝐺1 = 1 + 8 + 16 + 32 + 64 = 171(𝑜𝑐𝑡𝑎𝑙)
𝐺2 = [1 1 1 0 1 0 1],
𝐺2 = 1 + 4 + 16 + 32 + 64 = 165(𝑜𝑐𝑡𝑎𝑙)

The rate of the encoder is denoted by 𝑅 = 𝑘/𝑛. Here k=1 and n=3 so R=1/3.

18
3.3. Rate Matcher
The rate matching for convolutional coded transport channels and control information consists of
(0) (1) (2)
interleaving the three-bit streams, 𝑑𝑘 , 𝑑𝑘 𝑎𝑛𝑑 𝑑𝑘 followed by the collection of bits and the generation
of a circular buffer as depicted in Figure 26.

Figure 26: Rate Matcher stages

Rate matcher stages: -


1. Sub-block interleavers
(0) (1) (2)
In this stage inputs from tail bit encoder which are 𝑑𝑘 , 𝑑𝑘 𝑎𝑛𝑑 𝑑𝑘 are interleaved according to
(0) (1) (2)
specific sub-block interleaving scheme to generate outputs which are 𝑣𝑘 , 𝑣𝑘 𝑎𝑛𝑑 𝑣𝑘

The sub-block interleaving scheme: -


(𝑖) (𝑖) (𝑖)
The bits input to the block interleaver are denoted by 𝑑0 , 𝑑1 , … … , 𝑑𝐷−1 where D is the number of
bits “D = TBS + 24”, The output bit sequence from the block interleaver is derived as follows:
𝑐𝑐
1. Assign 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 = 32 to the number of columns of the matrix, the columns of the matrix are
𝑐𝑐
numbered from 0,1, 2….., 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 − 1 from left to right.
𝑐𝑐 𝑐𝑐 𝑐𝑐
2. The number of rows of the matrix will be 𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 , such that 𝐷 ≤ 𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 × 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 , since
the number matrix elements have to be equal to the maximum possible number of bits, therefore
𝑐𝑐 2536+24
𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 = 32
= 80.
𝐶𝐶
The rows of rectangular matrix are numbered 0, 1, 2,…,𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 − 1 from top to bottom.
𝐶𝐶 𝐶𝐶 𝐶𝐶 𝐶𝐶
3. If (𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 × 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 ) > 𝐷, then 𝑁𝐷 = (𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 × 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 − 𝐷) dummy bits are padded
such that 𝑦𝑘 = < NULL > for k = 0, 1,…, ND - 1.
(𝑖)
Then, 𝑦𝑁𝐷 +𝑘 = 𝑑𝑘 , for k = 0, 1,…, D - 1, and the bit sequence 𝑦𝑘 is written into
𝐶𝐶 𝐶𝐶
the (𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 × 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 ) matrix row by row starting with bit y0 in column 0 of row 0:

19
𝑦0 𝑦1 𝑦2 ⋯ 𝑦𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1
𝑦𝐶 𝐶𝐶 𝑦𝐶 𝐶𝐶 𝑦𝐶 𝐶𝐶 ⋯ 𝑦2𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 +1 𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 +2 −1
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘
⋮ ⋮ ⋮ ⋱ ⋮
[𝑦(𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘
𝐶𝐶 𝐶𝐶
−1)×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 𝑦(𝑅𝐶𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1)×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 +1
𝑦(𝑅𝐶𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1)×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 +2
⋯ 𝑦(𝑅𝐶𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 ×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1) ]

4. Perform the inter-column permutation for the matrix based on pattern ⟨𝑃(𝑗)⟩𝑗∈{0,1,...𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1}
that is shown in table 1, where 𝑃(𝑗) is the original column position of the j-th permuted column.
𝐶𝐶 𝐶𝐶
After permutation of the columns, the inter-column permuted(𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 × 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 ) matrix is
equal to:
𝑦𝑃(0) 𝑦𝑃(1) 𝑦𝑃(2) ⋯ 𝑦𝑃(𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1)
𝑦𝑃(0)+𝐶 𝐶𝐶 𝑦𝑃(1)+𝐶 𝐶𝐶 𝑦𝑃(2)+𝐶 𝐶𝐶 ⋯ 𝑦𝑃(𝐶 𝐶𝐶 −1)+𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘
⋮ ⋮ ⋮ ⋱ ⋮
[𝑦𝑃(0)+(𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘
𝐶𝐶 𝐶𝐶
−1)×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 𝑦𝑃(1)+(𝑅𝐶𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1)×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘
𝑦𝑃(2)+(𝑅𝐶𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1)×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘
⋯ 𝑦𝑃(𝐶 𝐶𝐶 𝐶𝐶 𝐶𝐶
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1)+(𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 −1)×𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 ]

5. The output of the block interleaver is the bit sequence read out column by column from the inter-
𝐶𝐶 𝐶𝐶
column permuted (𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 × 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 ) matrix.
(𝑖) (𝑖) (𝑖) (𝑖)
The bits after sub-block interleaving are denoted by𝑣0 , 𝑣1 , 𝑣2 , . . . , 𝑣𝐾𝛱 −1 , where 𝑣𝑜𝑖
𝐶𝐶 𝐶𝐶
corresponds to 𝑦𝑝(0) , 𝑣1𝑖 to 𝑦𝑃(0)+𝐶 𝐶𝐶 … and 𝐾𝛱 = (𝑅𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 × 𝐶𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘 )
𝑠𝑢𝑏𝑏𝑙𝑜𝑐𝑘

Figure 27: inter-column permutation pattern for sub-block interleaver

2. Bit collection

The circular buffer of length 𝐾𝑤 = 3𝐾𝛱 is generated as follows:


(0)
𝑤𝑘 = 𝑣𝑘 for k = 0,…, 𝐾𝛱 − 1
(1)
𝑤𝐾𝛱 +𝑘 = 𝑣𝑘 for k = 0,…, 𝐾𝛱 − 1
(2)
𝑤2𝐾𝛱 +𝑘 = 𝑣𝑘 for k = 0,…, 𝐾𝛱 − 1

20
3. Bit selection and data transmission
Denoting by E the rate matching output sequence length, the rate matching output bit sequence is
𝑒𝑘 , k = 0, 1,.., 𝐸 − 1.

Set k = 0 and j = 0
while {k < E}

if 𝑤𝑗 𝑚𝑜𝑑 𝐾𝑤 ≠< 𝑁𝑈𝐿𝐿 >

𝑒𝑘 = 𝑤𝑗 𝑚𝑜𝑑 𝐾𝑤

k = k +1
end if
j = j +1
end while

3.4. Scrambler

According to 3GPP Narrowband LTE standard release 14, Scrambler implementation is based on pseudo
random sequence that generated from two 31-bit linear feedback shift register (LFSR) and XOR gates.
The two sequences of the two LFSRs get XORed and the output is called a Gold sequence.
The sequence of each LFSR is configure by the following polynomials:

 𝑥1 (𝑛) = 1 + 𝐷 3 + 𝐷 0
 𝑥2 (𝑛) = 1 + 𝐷 3 + 𝐷 2 + 𝐷1 + 𝐷 0
In order to get the required output each LFSR must be initialized so

 𝑥1 (𝑛) will be initialized as follow:


𝑥1 (0) = 1, 𝑥1 (𝑛) = 0 𝑤ℎ𝑒𝑟𝑒 𝑛 = 1,2,3, … … ,30
 𝑥2 (𝑛) will be initialized as follow:
30
𝐶𝑖𝑛𝑖𝑡 = ∑ 𝑥2 (𝑖) ∗ 2𝑖
𝑖=0
𝑛𝑠 𝑐𝑒𝑙𝑙
𝐶𝑖𝑛𝑖𝑡 = 𝑛𝑅𝑁𝑇𝐼 ∗ 214 + 𝑛𝑓 𝑚𝑜𝑑2 ∗ 213 + ⌊ ⌋ ∗ 29 + 𝑁𝐼𝐷
2
Then to work properly there is an initial 1600 shift cycles done to induce more randomness for the initial
state of the two LFSRs, so 𝑁𝑐 = 1600 then the two outputs of the LFSRs get XORed to generate the gold
sequence 𝑐(𝑛) so the output gold sequence is given by:

21
𝑐(𝑛) = (𝑥1 (𝑛 + 𝑁𝑐 ) + 𝑥2 (𝑛 + 𝑁𝑐 ))𝑚𝑜𝑑2

𝑥1 (𝑛 + 31) = (𝑥1 (𝑛 + 3) + 𝑥1 (𝑛))𝑚𝑜𝑑2

𝑥2 (𝑛 + 31) = (𝑥2 (𝑛 + 3) + 𝑥2 (𝑛 + 2) + 𝑥2 (𝑛 + 1) + 𝑥2 (𝑛))𝑚𝑜𝑑2

Figure 28: Gold sequence generation

3.5. Modulation Mapper

According to 3GPP Narrowband LTE standard release 14 there are two main schemes of modulation for
NB-IoT, QPSK and BPSK, both are supported for uplink and only QPSK is supported for downlink so we
will use QPSK only. QPSK is a modulation scheme that maps every two bits to one of four symbols on
the constellation (complex-valued modulation symbol S = I + jQ) as shown in the figure below.

bi , bi+1 I Q

00 1⁄ 1⁄
√2 √2

01 1⁄ −1⁄
√2 √2

10 −1⁄ 1⁄
√2 √2

11 −1⁄ −1⁄
√2 √2
Figure 29: QPSK modulation mapping

Figure 30: QPSK Constellation

22
3.6. Resource Element Mapper

3.6.1. NRS Generation

NRS value depends on three variables 𝑙, 𝑛s, 𝑎𝑛𝑑 𝑚 so that the value
of NRS changes according to any variation of these parameters.
𝑙: indicates 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑦𝑚𝑏𝑜𝑙𝑠 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑡ℎ𝑒 𝑁𝑅𝑆 and it
is 𝑒𝑞𝑢𝑎𝑙 5,6
𝑛𝑠: indicates 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑐𝑖𝑒𝑣𝑒𝑑 𝑠𝑙𝑜𝑡 𝑖𝑡𝑠 𝑟𝑎𝑛𝑔𝑒 [0,9]
𝑚𝑎𝑥𝐷𝐿
𝑚 = 0,1, … , 2𝑁𝑅𝐵 −1 , 𝑓𝑜𝑟 𝑛𝑎𝑟𝑟𝑜𝑤 𝑏𝑎𝑛𝑑,
𝑚𝑎𝑥𝐷𝐿
𝑁𝑅𝐵 = 1 𝑎𝑠 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑜𝑛𝑙𝑦 𝑜𝑛𝑒 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒 𝑏𝑙𝑜𝑐𝑘
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑚 = 0,1
Figure 31: Sub-frame (two slots) of the radio frame

1 1
𝑁𝑅𝑆 𝑣𝑎𝑙𝑢𝑒𝑠 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 ∶ 𝑟𝑙,𝑛𝑠 (𝑚) = (1 − 2𝑐(2𝑚)) + 𝑗 (1 − 2𝑐(2𝑚 + 1))
√2 √2
Where C refers to the pseudo random sequence generation defined by a length-31 Gold Sequence:

𝑐(𝑛) = (𝑥1 (𝑛 + 𝑁𝑐 ) + 𝑥2 (𝑛 + 𝑁𝑐 )) 𝑚𝑜𝑑2

𝑥1 (𝑛 + 31) = (𝑥1 (𝑛 + 3) + 𝑥1 (𝑛))𝑚𝑜𝑑2

𝑥2 (𝑛 + 31) = (𝑥2 (𝑛 + 3) + 𝑥2 (𝑛 + 2) + 𝑥2 (𝑛 + 1) + 𝑥2 (𝑛))𝑚𝑜𝑑2

𝑇ℎ𝑒 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑚 − 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 ∶


𝑥1 (0) = 1 𝑎𝑛𝑑 𝑥1 (𝑛) = 0 , 𝑛 = 1,2, … ,30
𝑇ℎ𝑒 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑚 − 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 ∶
𝑐𝑒𝑙𝑙 𝑐𝑒𝑙𝑙
𝐶𝑖𝑛𝑖𝑡 = 210 (7(𝑛𝑠 + 1) + 𝑙 + 1)(2𝑁𝐼𝐷 + 1) + 2𝑁𝐼𝐷 + 𝑁𝑐𝑝

𝑤ℎ𝑒𝑟𝑒 𝑁𝑐 = 1600 , 𝑁𝑐𝑝 = 1 𝑓𝑜𝑟 𝑛𝑜𝑟𝑚𝑎𝑙 𝑐𝑦𝑐𝑙𝑖𝑐 𝑝𝑟𝑒𝑓𝑖𝑥

23
3.6.2. Narrowband Reference Signals Location (NRS index)

𝑘 = 6𝑚 + (𝑣 + 𝑣𝑠ℎ𝑖𝑓𝑡 ) 𝑚𝑜𝑑 6

𝐷𝐿 𝐷𝐿
𝑙 = 𝑁𝑠𝑦𝑚𝑏 −2 , 𝑁𝑠𝑦𝑚𝑏 − 1 = 5,6 𝑖𝑛 𝑒𝑎𝑐ℎ 𝑠𝑙𝑜𝑡

𝑘: 𝑟𝑒𝑓𝑒𝑟𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑝𝑙𝑎𝑐𝑒 𝑜𝑓 𝑎 𝑝𝑖𝑙𝑜𝑡 𝑡𝑜 𝑏𝑒 𝑖𝑛 𝑤ℎ𝑖𝑐ℎ 𝑟𝑜𝑤, 𝑟𝑎𝑛𝑔𝑒𝑑 𝑓𝑟𝑜𝑚 0 𝑡𝑜 11 𝑎𝑠 𝑟𝑜𝑤 𝑟𝑒𝑓𝑒𝑟𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑠𝑢𝑏
− 𝑐𝑎𝑟𝑟𝑖𝑒𝑟
𝑙: 𝑟𝑒𝑓𝑒𝑟𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑠𝑦𝑚𝑏𝑜𝑙 𝑛𝑢𝑚𝑏𝑒𝑟, 𝑟𝑎𝑛𝑔𝑒𝑑 𝑓𝑟𝑜𝑚 0 𝑡𝑜 6
𝐷𝐿
𝑁𝑠𝑦𝑚𝑏 : 𝑟𝑒𝑓𝑒𝑟𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑦𝑚𝑏𝑜𝑙𝑠 𝑖𝑛 𝑜𝑛𝑒 𝑠𝑙𝑜𝑡 𝑓𝑜𝑟 𝑑𝑜𝑤𝑛𝑙𝑖𝑛𝑘 𝑎𝑛𝑑 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 7

0 𝑖𝑓 𝑙 = 5
𝑣={
3 𝑖𝑓 𝑙 = 6
𝑐𝑒𝑙𝑙
𝑣𝑠ℎ𝑖𝑓𝑡 = 𝑁𝐼𝐷 𝑚𝑜𝑑 6
max 𝐷𝐿 max 𝐷𝐿
𝑚 = 0,1, … ,2𝑁𝑅𝐵 − 1 𝑓𝑜𝑟 𝑛𝑎𝑟𝑟𝑜𝑤 𝑏𝑎𝑛𝑑 𝑁𝑅𝐵 = 1 𝑎𝑠 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑜𝑛𝑙𝑦 𝑜𝑛𝑒 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒 𝑏𝑙𝑜𝑐𝑘
𝑠𝑜 𝑚 = 0,1

3.6.3. Narrowband Primary Synchronization Signals (NPSS)

𝑇ℎ𝑒 𝑏𝑎𝑠𝑒 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑖𝑛 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑑𝑜𝑚𝑎𝑖𝑛 𝑑𝑙 (𝑛) 𝑖𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑎𝑠 𝑓𝑜𝑙𝑙𝑜𝑤𝑠


𝝅𝒖𝒏(𝒏+𝟏)
𝑑𝑙 (𝑛) = 𝑺(𝒍) . 𝒆−𝒋 𝟏𝟏 , 𝒖=𝟓 , 𝒏 = 𝟎, 𝟏, … , 𝟏𝟎
𝑇ℎ𝑒 𝑐𝑜𝑑𝑒 𝑐𝑜𝑣𝑒𝑟 𝑖𝑛 𝑡𝑖𝑚𝑒 𝑑𝑜𝑚𝑎𝑖𝑛 𝑆(𝑙) 𝑖𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑎𝑠 𝑓𝑜𝑙𝑙𝑜𝑤𝑠
𝑆(0: 10) = {1,1,1,1, −1, −1,1,1,1, −1,1}

Figure 32: Mapping of NPSS in the frame

24
3.6.4. Data Rate Calculation for NPDSCH

 𝑭𝒐𝒓 𝒂𝒏𝒚 𝑵𝑷𝑫𝑺𝑪𝑯 (𝒂𝒄𝒕𝒖𝒂𝒍 𝒓𝒂𝒕𝒆)


7 𝑂𝐹𝐷𝑀 ∗ 12 𝑠𝑢𝑏𝑐𝑎𝑟𝑟𝑖𝑒𝑟𝑠 ∗ 2 (𝑄𝑃𝑆𝐾)
𝑀𝑎𝑥. 𝑅𝑎𝑡𝑒 = = 336 𝐾𝑏𝑝𝑠
0.5 𝑚𝑠 (𝑠𝑙𝑜𝑡 𝑡𝑖𝑚𝑒)

 𝑼𝒔𝒆𝒇𝒖𝒍 𝒅𝒂𝒕𝒂 𝒓𝒂𝒕𝒆

 𝐹𝑜𝑟 𝑜𝑢𝑟 𝑇𝐵𝑆 = 960 , 𝑐𝑜𝑑𝑖𝑛𝑔 1⁄3 → 960 ∗ 3 = 2880 , 𝑄𝑃𝑆𝐾 →


1440 𝑠𝑦𝑚𝑏𝑜𝑙𝑠
𝑆𝑖𝑛𝑐𝑒 𝑒𝑎𝑐ℎ 𝑠𝑙𝑜𝑡 𝑐𝑎𝑛 𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡 [(7 ∗ 12) − 4 (𝑝𝑖𝑙𝑜𝑡 𝑠𝑦𝑚𝑏𝑜𝑙𝑠)] = 80 𝑠𝑦𝑚𝑏𝑜𝑙𝑠
1440
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑐𝑒𝑖𝑙 [ ] = 18 𝑠𝑙𝑜𝑡𝑠 = 9 𝑠𝑢𝑏𝑓𝑟𝑎𝑚𝑒𝑠
80
960
𝐷𝑎𝑡𝑎 𝑅𝑎𝑡𝑒 = = 96 𝐾𝑏𝑝𝑠
9 𝑚𝑠 + 1 𝑚𝑠

 𝐹𝑜𝑟 𝑇𝐵𝑆 = 296 , 𝑐𝑜𝑑𝑖𝑛𝑔 1⁄3 → 296 ∗ 3 = 888 , 𝑄𝑃𝑆𝐾 → 444 𝑠𝑦𝑚𝑏𝑜𝑙𝑠
𝑆𝑖𝑛𝑐𝑒 𝑒𝑎𝑐ℎ 𝑠𝑙𝑜𝑡 𝑐𝑎𝑛 𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡 [(7 ∗ 12) − 4 (𝑝𝑖𝑙𝑜𝑡 𝑠𝑦𝑚𝑏𝑜𝑙𝑠)] = 80 𝑠𝑦𝑚𝑏𝑜𝑙𝑠
444
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑐𝑒𝑖𝑙 [ ] = 6 𝑠𝑙𝑜𝑡𝑠 = 3 𝑠𝑢𝑏𝑓𝑟𝑎𝑚𝑒𝑠
80
296
𝐷𝑎𝑡𝑎 𝑅𝑎𝑡𝑒 = = 98.666 𝐾𝑏𝑝𝑠
3 𝑚𝑠

 𝐹𝑜𝑟 𝑀𝑎𝑥. 𝑇𝐵𝑆 = 2536 , 𝑐𝑜𝑑𝑖𝑛𝑔 1⁄3 → 2536 ∗ 3 = 7608 , 𝑄𝑃𝑆𝐾 →


3804 𝑠𝑦𝑚𝑏𝑜𝑙𝑠
𝑆𝑖𝑛𝑐𝑒 𝑒𝑎𝑐ℎ 𝑠𝑙𝑜𝑡 𝑐𝑎𝑛 𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡 [(7 ∗ 12) − 4 (𝑝𝑖𝑙𝑜𝑡 𝑠𝑦𝑚𝑏𝑜𝑙𝑠)] = 80 𝑠𝑦𝑚𝑏𝑜𝑙𝑠
3804
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑐𝑒𝑖𝑙 [ ] = 48 𝑠𝑙𝑜𝑡𝑠 = 24 𝑠𝑢𝑏𝑓𝑟𝑎𝑚𝑒𝑠
80
2536
𝐷𝑎𝑡𝑎 𝑅𝑎𝑡𝑒 = = 93.926 𝐾𝑏𝑝𝑠
24 𝑚𝑠 + 3 𝑚𝑠

25
3.7. IFFT & CP

LTE supports different channel bandwidths and as a result a different number of resource blocks can be
supported. Where the resource block is the smallest unit of resources that can be allocated to a user. The
resource block is 180 kHz wide in frequency and 1 slot long in time. In frequency, resource blocks are 12
x 15 kHz, where 12 is the number of the subcarriers where each one is 15 kHz spacing. In NB-IOT we
have only one resource block because the targeted devices is not expected to operate with huge data rates
as in the LTE.
As in DL we use only QPSK and we have a sub-frame of length 0.5ms which consists of 7 OFDM
symbols each with 12 sub-carrier, this constrained the data rate to be 336 Kbps which is not huge as in
LTE which may reach 1Gbps in DL and 500 Mbps in UL.
As we have only 12 subcarriers in NB-IOT so we choose the IFFT size to be 16-point to reduce power,
area and latency as it is the main goal of the NB-IOT, In the 16-point first 12 subcarriers are carrying data
and the last four are padded zeroes which act as a guard band for the transmitted data.
In LTE there is two types of cyclic prefix, normal cyclic prefix where this configuration has 7 OFDM
symbol within the resource block and extended cyclic prefix which has 6 OFDM symbol, NB-IOT
(downlink) (Release 14) only supports the normal cyclic prefix where OFDM symbol duration is 66.7 µs
and a CP of 5.02 µs for first symbol and 4.67 µs for rest of OFDM symbols. So the OFDM symbol length
for first OFDM symbol is 71.18 µs and the remaining 6 symbols are 71.39 which leads us to the total
length of the slot (resource block) which is 0.5 ms.
We implemented the I and Q to be 16-bit for each, to fit the range of numbers we deal with, and we notice
that the last 4 bits of each 16 bits of the IFFT output is redundant, so we can consider the 2 DACs (next
block after IFFT) are 12-bit, one for I and the other for Q.

26
Chapter Four

4. Design Implementation
4.1. CRC

4.1.1. Block Diagram

Figure 33: CRC top level

Figure 34: CRC Architecture

4.1.2. Interface Table

Port Direction Width Description

clk input 1 bit System clk

rst_n input 1 bit Block reset

data input 1 bit serial input data its size is TBS

When its high, the CRC will start to work, always high while the block is working
CRC_En input 1 bit
or for one clk cycle the system will work

TBS input 12 bits The length of data, comes from the upper layer, its max value is 2536

CRC_out output 24 bits the CRC bit comes out when valid is heigh, its name in the code is mem_elem

CRC_valid output 1 bit it represents the data is ready in the CRC and it becomes high for one clk cycle

Encode_init output 6 bits the last 6 bits in the CRC bit we out this bit in parallel to the encoder
Table 1: Port description of CRC

27
4.1.3. Operation

• First, initialize the register with zero values.


• Second, we have the equation of LFSR so we designed the shift register as described in chapter 3.
• Third, shift the data by TBS
• Fourth, send the bits found in the reg to the RAM to shift them after the data, and the last 6 bit of
output is also sent to the encoder to initialize its registers

4.1.4. FSM

This state that the crc CRC_valid=0


will be on it until its
Shift=0
enable comes
IDLE

The fsm inter this


CRC_en=1
state for one clk cycle
to make the valid high
for one clk cycle

CRC_valid=0
S1 Shift=1
S0
Count_done=1
All the function of the crc is
CRC_valid=1
maked in this state, as it shift the
Shift=0 data in LFSR foe number of clk
cycles=tbs

State Table
Current State Status signal Next State Control Signals
IDLE S0 CRC_valid=0 , Shift=0
If (CRC_en==1)
S0 S1 CRC_valid=0 , Shift=1
If (count_done==1)
S1 (Next Cycle) IDLE CRC_valid=1 , Shift=0
Table 2: State Table of CRC FSM

28
4.1.5. Simulation results

1) Comparing our MATLAB function with the built-in function


Here we generate the data stream and calculate its CRC values then compare this bit with the output
of the built-in function

Figure 35: Generated & built-in MATLAB functions comparison of CRC

2) Comparing MATLAB with RTL

As shown in the figure below, when the CRC_valid is high so the data in mem_elem is the correct CRC
bits (the first signal), and y signal contain the CRC bits that generated from MATLAB so they are the
same.

Figure 36: Correct CRC bits in RTL as MATLAB

Finaly we generate multiple test cases to cover all the corner cases in matlab and compare it with
verilog code outputs

29
4.1.6. Synthesis Results

Figure 37: CRC Power report

Figure 38: CRC Area report

30
Figure 39: Timing_max (setup time) report of CRC

4.1.7. Latency

The number of clock cycles from the enable signal to the valid signal to be high = TBS

31
4.2. Encoder

4.2.1. Block Diagram

Figure 40: Tail-bit convolutional encoder top level

4.2.2. Interface Table

Port Direction Width Description


CLK input 1 bit System clock
rst_n input 1 bit Block reset

data input 1 bit Serial input data its size is TBS comes from the serializer

When its high the encoder will work, can work for one clock
Encoder_En input 1 bit cycle or more but it should be high after the crc_valid signal
is high

The last 6 bits in the CRC bit we take this bit in parallel to
Encoder_initial input 6 bits
the encoder, they are ready when crc_valid is high

TBS input 12 bits The length of data comes from the upper layer; its max
value is 2536
CRC_valid input 1 bit it represents the data is ready in the CRC and it becomes
high for one clock cycle
Out1 output 1 bit The first encoded output (G1)

Out2 output 1 bit The second encoded output (G2).

Out3 output 1 bit The third encoded output (G3).

Valid signal becomes high when the data is ready to come


Encoder_valid output 1 bit
out, becomes high after 1 clock from the encoder_en
Table 3: Port description table of Tail-bit Conv. encoder

32
Figure 41: tail-bit conv. encoder architecture

4.2.3. FSM

This state that the


encoder will be on
And waiting until
CRC_valid comes

IDLE
CRC_valid=1
Counter_done=1

S0
S2
After the CRC_valid signal
is high so the CRC bits is
State for shifting the ready so this stat is to
bits and still raise the initialize the register
valid signal high for Encoder_en=1
number of clk cycles
equal tbs-1
S1
State for starting the
shifting of the bits
and raise
33 the valid
signal high
Current State Status signal Next State Control Signals
Shift = 0

IDLE If (crc_valid==1) S0 Zero = 0


encoder_valid = 0

Shift = 0
If (encoder_en==1)
S0 S1 Zero = 1
encoder_valid = 0
Shift = 1
(Next Cycle)
S1 S2 Zero = 0
encoder_valid = 0
Shift = 1

S2 If (count_done==1) IDLE Zero = 0;


encoder_valid = 1
Table 4: State Table of the encoder's FSM

4.2.4. Operation

 First, initialize the register with last 6 bits of data


 Second, shift the data by the number of the bits equal TBS and raising the valid signal
high.
 Third, the data enter serially so each clock cycle one bit entered and three bits comes out
until all data out from encoder

34
4.2.5. Simulation results

1) Comparing MATLAB with built in

Here we generate the data stream and calculate three encoder output streams values then compare this
stream with the output of the built-in function then print it in a file, and this is the output of the
comparison in MATLAB

Figure 43: Generated & built-in first encoder output stream Figure 42: Generated & built-in second encoder output stream

Figure 44: Generated & built-in third encoder output stream

35
2) Comparing matlab with verilog

The figure below shows that the block started to generate the three outputs and assert the encoder_valid
high after one clock cycle from the encoder_en as expected

Figure 45: Encoder output streams and valid signal

As the data output is serial, so we collect out1 in string_out1 and out2 in string_out2 and out3 in
string_out3 and compare when the encoder_valid signal comes low the string_out with the data
generated from MATLAB, so we compare string_out1 with y1 and so on.

Figure 46: Comparing MATLAB and RTL results

Finaly we generate multiple test cases to cover all the corner cases in matlab and compare it with
verilog code outputs

36
4.2.6. Synthesis Results

Figure 47: Encoder Area report

Figure 48: Encoder power report

37
Figure 49: Encoder Timing_max (setup time) report

4.2.7. Latency

The number of clock cycles from the enable signal to the valid signal to be high = 1

38
4.3. Rate Matcher

We have designed the rate matcher block through two designs which are slightly different in some
features, hardware and algorithm.
The Second Design is more optimized and improved than the first design.

4.3.1. Block Diagram

Figure 50: Rate Matcher main architecture

39
4.3.2. Interface Table

Port Direction Width Description


rm_clk Input 1 bit Operating clock.

rm_rst_n Input 1 bit Reset the whole block.

rm_enable_ld Input 1 bit Enable block to read input stream (Valid from tail biting encoder)
In_d0 Input 1 bit First input stream which is d0 output from the tail biting convolution
encoder.

In_d1 Input 1 bit Second input stream which is d1 output from the tail biting
convolution encoder.

In_d2 Input 1 bit Third input stream which is d2 output from the tail biting convolution
encoder.
data_bits Input 12 bits Upper layer parameter to determine transport block size in addition
to 24 bits of CRC.

out_enable Input 1 bit Signal that enables output stream to get out

out_control Input 2 bits Determine if output stream will be 24 bits  2’b11


20 bits  2’b01

data_out Output 1 bit Output bit stream from the block.

rm_valid Output 1 bit Valid output enables for the next block to start reading the RM
output.
data_end Output 1 bit Indicates that all data bits are got out and the block is waiting for
new input stream.

bit_sel_done Output 1 bit Indicates that the selection of 24 bits or 20 bits is done and the block
is ready to get out the output data

Table 5: Rate Matcher Interface Table

4.3.3. General Steps of Operation

1. Serial to parallel and interleaving


In this step, input stream from tail bit encoder is registered in 32-bit register according to look-up
table defined in the standard to allow interleaving of input bits.
2. Bit collection
Every 32 bits which are interleaved and registered is then send to written in a register file (80*32)
So, we have 3 register files for 3 input streams

40
Maximum TBS is 2536 bits in addition to 24 bits CRC, then maximum depth*width of register file is
80*32 = 2560.
3. Bit selection
In this step, we select data bits from the register file column by column and get rid of dummy bits
which were padded before in the register file to complete the empty places in each register file.
NOTE  the difference between the first design and the second one is mainly based on this step and
how to optimize it.
4. Output stage
In this step, output is ready to get out and wait for enable signal, also control signal is important here to
inform the block about number of bits needed to get out “24 bits or 20 bits” according to demand of the
system.

4.3.4. First design

We will talk briefly about the first design and its idea of operation then; we will go to the second design
as it is the final design we have reached and the more optimized and efficient design.

Figure 51: Illustrative architecture of first design

4.3.4.1. Operation

1. After bit collection stage, when all data bits are saved in the register files and the input
streams from the encoder have been finished, bit selection stage starts.
2. Data is selected from the first register file bit by bit each clock cycle column by column.
3. The first bit is selected from the row which includes both data bits and dummy bits and we
differentiate between them using the null register which we have designed in the first stage of
the rate matcher block, In short, the null register is 32 bits register corresponds to the mixed

41
row in each register file “the row which includes both data bits and dummy bits”, if any bit of
the 32 bits is “1” therefore there is a data bit in the corresponding index and if any bit of the
32 bits is “0” therefore there is a dummy bit in the corresponding index and we will not
select it. Look at figure 53.
4. After the selection from the register file, we repeat the previous steps with register file 2 then
3 and all selected bits are saved in the circular buffer from the least significant bit to the most
one.
5. When output enable becomes high  rm_valid gets high and data started to get out
according to the value of out_control
6. Output data is 24 serial bits stream or 20 bits only, this according to the LTE narrow band IoT
resource block which demands 12 QPSK symbols in the first 5 OFDM symbols and 10 QPSK
symbols with 2 reference signals symbols in OFDM symbol 6 and 7. Look at figure 52

Figure 53: Register File Implementation Figure 52: Resource Allocation

Figure 54: The used Permutation look-up table

42
4.3.4.2. Disadvantages of the design
i. The large latency
Latency of design #1 = (TBS + 24) + (3*(TBS+24))
 The first term is during the input bits stream which we can’t deal with.
 The second term is due to the bit selection from the register files and saving data bits
in the large circular buffer from 3 register files.

ii. The large area and power


 The circular buffer is a large register consists of 7680 flip flops in addition to the 3
register files, so we have about 15k flip flops in the design.
 Also, number of LUTs is large appears after synthesizing this block.
4.3.5. Second design

4.3.5.1. Idea of optimization

 We have a problem from the first design regarding the large latency appears in bit selection
stage in addition to the large area represented in the large circular.

 We also discovered that the IFFT block “the last block in our system’s chain” needs about
137 clock cycles to operate on 1 OFDM symbol only and during these clock cycles all blocks
in the chain are halted waited for IFFT to finish.

 So, we can benefit from this by selecting and registering the output bits either 24 or 20 bits
from the register files in small registers for a maximum 30 clock cycles instead of the large
circular buffer and wait for IFFT to finish then enable signal becomes high to get out the
output then registering the data bits again in small registers and wait and so on.

Figure 55: Illustrative architecture of second design

43
4.3.5.2. Operation

1) The s_to_p module

Figure 56: Serial-to-parallel module

1. Data input streams are registered in 6 registers (2 for each stream, one for the first row in the
register file which is the mixed row including data bits and dummy bits)
 Sp_out0, sp_out1 and sp_out2 are for the mixed row using counter 1 counts from 0
to the number of data bits in the mixed row
 Sp_out0, sp_out1 and sp_out2 are for the rest below rows in the register files using
counter 2 counts from 0 to 32 until data input streams are finished
2. Registering data input bits is based on the look-up table.
3. At the same time of counter 1 counting, the bit of the corresponding location in null register
becomes 1.
4. At the beginning address is equal to the position of the mixed row then it is incremented.
5. When write enable is high, we wrote in the register files.

44
2) Register files and bit selection module

Figure 57: Register files and bit selection module

1. After the input data are completely registered in the register files and we ready to select data bits,
we wait until out_control signal changes its value to 01  select bits and save them in the 24-bit
registers or 11  select bits and save them in the 20-bits register.
2. The block then is halted until out_enable becomes high for 1 clock cycle then rm_valid becomes
high and data_out stream gets data out either from the 24-bit register or 20-bit register according
to:
i. Out_reg_24_en  data is got out from 24-bit register.
ii.Out_reg_20_en  data is got out from 20-bit register.
3. Each register file “regArr, regArr_1 and regArr_2” has index for column and index for rows
which are incremented during selection of bits
4. Idx_0_0, Idx_0_1 and Idx_0_2  each starts from the mixed row until row 79.
5. Idx_1_0, Idx_1,1 and Idx_1,2  each starts from column 0 to column 31.
6. When the first register file indices reach their last positions “idx_0_0  79, Idx_1_0  31, conv
register changes it’s value to “01” and selection from the second register file starts, when the
second register file indices reach their last positions, conv changes it’s value to “11” then to “10
after the third register file indices reach their last positions and by this all data has been selected.
7. bit_sel_en  to enable the step of selection of data bits and get rid of null bits by the help of null
register and register data bits in the 24- or 20-bits registers.
8. bit_sel_done  when the step of bit selection is finished.
9. Output_ready  from rate matcher finite state machine to be ready for output stage when output
enable becomes high.
10. Output_done  when all either 24 or 20 bits are got out.
11. Data_end  signal sent to finite state machine when all data in the register files have got out to
go to IDLE state to be ready for a new input stream.

45
3) The Finite State Machine

Figure 58: Rate Matcher State Diagram

Current state Next state Description of the current state


 INITIALIZATION  if rm_en = 1 Waiting for the rate matcher block to operate or a new
IDLE
 IDLE  if rm_en = 0 input stream is available.

Reading data_bits which is the number of data input


INITIALIZATION  CALCULATIONS
bits  TBS + 24.

Calculating:-
CALCULATIONS  READY  The position of the mixed row in the register files.
 The number of data bits in the mixed row.

 S_TO_P  rm_enable_ld = 1 The block is ready for input streams from tail bit
READY
 READY  rm_enable_ld = 0 convolutional encoder.

 BIT_SELECTION  rm_enable_ld = Convert the input streams from serial input to 32


S_TO_P 0 parallel bits registers, registered according to look-up
 S_TO_P  rm_enable_ld = 1 table then saved in the register files.

 BIT_SELECTION  (sp_done) &&


(!bit_selection_done) && (!data_end) Selecting data bits from the register files and waiting
BIT_SELECTION  OUTPUT  (sp_done) && for the output enable.
(bit_selection_done) && (!data_end) Getting rid of the dummy bits during selection.
 IDLE  data_end = 1

OUTPUT  (101) BIT_SELECTION  output_done = 1 Data_bits are got out.


Table 6: Rate Matcher State Table

46
4.3.5.3. Enhancements and improvements in the second design: -

1. The small latency


Compared to the first design latency here is very small.
Latency of design #2 = (TBS + 24) + (24 or 25)

2. The small area and power


As the absence of the large buffer allow us to have a maximum of 7k flip flips not 15k used in the
register files also there is a large decrease in the number of LUTs.

4.3.6. Simulation results

We generate a simple MATLAB code define the algorithm of the rate matcher for convolutional encoding
and generate a random bit stream “d0, d1 and d2”, then we compared the output from our generated code
with the built-in function “lteRateMatcherConvolutional” and the result was that the two outputs are
identical and our generated algorithm is correct.

Figure 59: comparison between MATLAB built in and our code

47
RTL simulations
We have tested 2 different streams, the first input stream was 360 bits from the encoder and compared the
output bits by the output bits from the MATLAB, after all data bits have got out and the block returned to
IDLE state, we have tested another input stream which was 160 bits and applied the previous steps again.

Figure 61: first 24 bits in MATLAB Figure 60: first 24 bits in RTL

Figure 63: first 20 bits in MATLAB


Figure 62: first 20 bits in RTL

Final Results

Figure 64: RTL final simulation results

48
Waveforms
As shown in the figure below, when active low reset becomes “0”, the whole returns to IDLE state and
we start reading input when encoder valid becomes high.

Figure 65: states transition from reset to input read

Reading 360 bits in serial from tail bit convolutional encoder through 3-bit streams in_d0, in_d1 and
in_d2 when rm_enable_ld is high.

Figure 66: Input stream reading

When out_control changes for 1 clock cycle to “11”, this means that we will select 24 bits and be ready
for out_enable to become high for output stage.

Figure 67: Bit selection and out_control new value

49
When out_enable becomes high, rm_valid becomes high and data is got out serially.

Figure 68: Output state

At the end of the last output stream data_end flag becomes high indicates that all data in the register files
are got out and the rate matcher block will return to IDLE state for a new input stream.

Figure 69: Last output data

50
4.3.7. Synthesis Results

Figure 70: Rate Matcher Area report

Figure 71: Rate Matcher Power report

51
Figure 72: Rate Matcher Timing_max (setup time) report

52
4.4. Scrambler

4.4.1. Block Diagram

Figure 73: Scrambler Block Diagram

4.4.2. Interface Table

Port Direction Width Description

clk Input 1 bit System clock.

rst_n Input 1 bit Block reset.

scr_enable Input 1 bit Block enable

scr_enable_load Input 1 bit Enable block by valid out of rate matcher to start reading the input bit stream.

scr_data_in Input 1 bit Input bit stream coming from rate matcher.

rnti Input 16 bit Radio Network Temporary ID given from upper layer.

ns_4msb Input 4 bit First slot number 4msb given from upper layer.

nf_lsb Input 1 bit First system frame number LSB given from upper layer.

cell_id Input 9 bit Physical layer cell identity given from upper layer.

scr_out Output 1 bit Output bit scrambled by the gold sequence.

Signal high with the output bit for the next block to know that is a valid
scr_out_valid Output 1 bit
output to read.
Table 7: Interface table of scrambler

53
Figure 74: Scrambler architecture

4.4.3. Operation

There is a finite state machine controlling the states which the scrambler will be work in it so:

 After reset, the scrambler goes to an initialization state to assert the initial values of the LFSRs
using the upper layer parameters.
 After LFSRs initialization is done, shift state is enabled to start the 1600 of shifting to have the
gold sequence ready and that take 1600 clock cycles.
 After the 1600 cycle of shifting, the block is waiting in its state for a valid input coming from rate
matcher to start reading the input.
 Then when the valid become high the block start reading the input and XORing it with the gold
sequence generated by the two LFSRs to generate the output.

Figure 75: Scrambler FSM

54
4.4.4. Challenges

 First challenge was calculating the initial value of x2 as there is more than one multiplication but
it was noticed that all multiplication are by number of base 2 so it can just performed using shift
and concatenation only.
 Second challenge is the large number of input ports because of upper layer parameters used in x2
initial value calculation so as an optimization not all bits were taken as input for some parameters
as for first system frame number 𝑛𝑓 it is consists of 10 bits but only LSB bit is used in the design
as it’s the result of 𝑛𝑓 𝑚𝑜𝑑2, and for the first slot number it consists of 5 bits but only 4 MSBs is
𝑛
used in the design as they are the result of ⌊ 𝑠⁄2⌋.

4.4.5. Future work:

We can optimize in the number of cycles for the initial shift state and instead of using 1600 cycle we can
use only 1 clock cycle and that is done by calculating the indices of the bits in the LFSR we need to XOR
together to get the value of each bit in the LFSR after 1600 cycle using the following equations:

𝑥1 (𝑛 + 31) = (𝑥1 (𝑛 + 3) + 𝑥1 (𝑛))𝑚𝑜𝑑2

𝑥2 (𝑛 + 31) = (𝑥2 (𝑛 + 3) + 𝑥2 (𝑛 + 2) + 𝑥2 (𝑛 + 1) + 𝑥2 (𝑛))𝑚𝑜𝑑2

So we substitute 𝑛 = 1600 and then get the indices for XORing for each bit in LFSR so the indices will
be from 0 to 31.
This method will be apply only on 𝑥2 as the initial value before shift changes according to the upper
layer parameters but 𝑥1 have a constant initial value so we can calculate its value after 1600 cycle and
write it every time we need to initialize it.

55
4.4.6. Simulation results

Figure 76: MATLAB output for scrambler NcellID = 65535, nf = 1, ns = 0

Figure 77: RTL output for scrambler NcellID = 65535, nf = 1, ns = 0

56
4.4.7. Synthesis Results

Figure 78: Scrambler area report

Figure 79: Scrambler Timing_max (setup time) report

57
Figure 80: Scrambler power report

58
4.5. Modulation Mapper

4.5.1. Block Diagram

Figure 81: Modulation Mapper Block Diagram

4.5.2. Interface Table

Port Direction Width Description

clk Input 1 bit System clock.

rst_n Input 1 bit Block reset.

Enable the block by the valid out of scrambler to start reading the
mod_enable Input 1 bit
input bit stream.

data_in Input 1 bit Input bit stream coming from scrambler.

mod_i Output 16 bit Real component of the output symbol.

mod_q Output 16 bit Imaginary component of the output symbol.

Signal high with the output symbol for the next block to know that
mod_valid Output 1 bit
is a valid output to read.
Table 8: Modulator interface table

59
4.5.3. Operation
As we use QPSK modulation scheme so the block maps each two bits to their corresponding symbol so it
takes 2 clock cycles to generate one symbol to the next block, the symbol consists of 16-bits the most 5
bits for integer and the fraction part is 11 bits so for

 +1⁄ = 0000010110101000
√2
 −1⁄ = 1111101001011000
√2
We design the modulation mapper with two approaches:
 The first approach was introduced using the information of the incoming “tbs” from upper layer :
 The first incoming bit is stored in a 1-bit register “temp”, when the second input bit comes,
the multiplexing takes place and generate the corresponding symbol.
 Similarly, the third bit is stored and compared along with the fourth bit, and so on.
 Generally, in the odd clock cycles, the incoming bit is stored. In the even clock cycles, the
stored bit is compared along with the incoming bit in this even cycle to generate the output
symbol.
 That was achieved by a counter which counts till the upcoming “tbs” and by using the
condition that if the LSB of the counter is one then it is an odd cycle, so it’s expected to store
the bit for one more cycle, otherwise it is an even cycle, so it’s expected to have an output
symbol.
 Since each 2 bits are mapped to the corresponding symbol, therefore it takes 2 clock cycles to
generate one symbol to the next block.
 The second approach was introduced without using the information of the incoming “tbs” from upper
layer:
But using a bit “is_odd” which toggles every clock cycle:
 If it is low: then the incoming bit is stored.
 If it is high: then the incoming bit is compared along with the previous one to get
multiplexed to generate the output symbol.
At the end we use the second approach as we didn’t use the 12-bit counter of the tbs and use 1-bit toggle
every cycle only.
As a conclusion the first bit is used to generate the real part of the symbol (I) and the second bit is used to
generate the imaginary part of the symbol (Q) as shown in the figure below.

Figure 82: Modulation Mapper symbol generation

60
4.5.4. Simulation results

We tested the four cases of the QPSK modulator and all pass in MATLAB and RTL simulation so there is
an example for one of the cases.
For input stream: “1 0”

Figure 83: Modulation Mapper RTL output

Figure 84: Modulation Mapper MATLAB output

4.5.5. Synthesis results

Figure 85: Modulation Mapper area report

61
Figure 86: Modulation Mapper power report

Figure 87: Modulation mapper Timing_max (setup time) report

62
4.6. Resource Element Mapper
4.6.1. NRS Generation
4.6.1.1. Block Diagram

Figure 88: Block Interface of NRS generation


4.6.1.2. Interface Table
Port Type width Description
clk Input 1 bit System clock

rst_n Input 1 bit Block reset

nrs_enable Input 1 bit Block enable


subframe_rem
Input 1 bit Comes from element mapper indicating that a sub-frame is done
_done

toggle_nrs Input 1 bit Comes from element mapper to decide which part of the pilots memory will be read

n_cell_id Input 9 bits Physical layer cell identity given from upper layer

nrs_i_out1 Output 16 bits In-phase output symbol of the narrow band reference signal

nrs_q_out1 Output 16 bits Quadrature-phase output symbol of the narrow band reference signal

nrs_i_out2 Output 16 bits In-phase output symbol of the narrow band reference signal

nrs_q_out2 Output 16 bits Quadrature-phase output symbol of the narrow band reference signal

nrs_i_out3 Output 16 bits In-phase output symbol of the narrow band reference signal

nrs_q_out3 Output 16 bits Quadrature-phase output symbol of the narrow band reference signal
nrs_i_out4 Output 16 bits In-phase output symbol of the narrow band reference signal

nrs_q_out4 Output 16 bits Quadrature-phase output symbol of the narrow band reference signal
Table 9: Interface Table of NRS generation

63
4.6.1.3. Operation

 Each slot includes 4 pilot symbols as shown in fig.89, we want to have


the reference signals of a complete sub-frame at a time.

 This was achieved by repeating the (31-bit X2 LFSR) 4 times as shown


in fig.91 :
 X2_1 : gives NRS of first slot & L = 5
 X2_2 : gives NRS of first slot & L = 6
 Similarly for X2_3 & X2_4 for the second slot
Figure 89: Two slots of the frame
 All these 4 LFSRs are initialized in parallel whenever the nrs_enable becomes
high according to the standard initialization equation.

 Then shift takes place in the X1 LFSR and in the X2 LFSRs according to equation (4) and is done for
1600 cycles. This can be noted in INITIAL SHIFT state in the FSM. Then the LSB of X2 is xored
with the LSB of X1 to get the gold sequence as shown in fig.90

Figure 90: The two LFSRs of the NRS values

 Here comes the role of the 4-bit registers (c_n_1 , c_n_2 , c_n_3 , c_n_4) which store the four output
bits from the generated sequence due to xoring X1 with (X2_1 , X2_2 , X2_3 , X2_4) respectively as
shown in fig.91.

 After storing in the 4-bit registers is done, we go into a state which waits the subframe_rem_done
signal from the element mapper module, then we go into muxing state which uses each 2 bits of the 4-
bit registers as selection lines to generate the 16-bit symbol. Otherwise, we wait in this state.

Figure 91: The 4-bit registers & LFSRs configuration

64
 The generated symbols are stored in a register file “pilots” to store the pilots of the whole
subframe and using the “toggle_nrs” signal we can select the pilots of any of the two slots.
 A signal “toggle_nrs” comes from element mapper to decide which part of the pilots memory will
be read depending on the slot (first or last 4 locations of pilots memory).
 All the previous steps are to be repeated once a signal “subframe_rem_done” becomes high
which comes from the element mapper.

Figure 92: Pilots memory elements

4.6.1.4. FSM

A finite state machine is needed to organize the steps of generating the pilots, an illustrative summarized
diagram of the controller status & control signals are shown in the below figure

Figure 93: illustrative diagram of datapath & controller of NRS values

65
Figure 94: NRS_Generation FSM State Diagram

Current State Condition (Status) Next State Output (Control)

NRS Generation block is Initializing the LFSRs is


IDLE INITIAL
enabled enabled

Initial 1600 shift cycles counter


INITIAL Initializing the LFSRs is done INITIAL_SHIFT
is enabled

Initial 1600 shift cycles counter XORING & storing in 4-bit


INITIAL_SHIFT STORE
is done registers are enabled

XORING & storing in 4-bit XORING & storing in 4-bit


STORE HLT_MUXING
registers are done registers are disabled

A subframe is done AND we are


not in the first subframe
HLT_MUXING MUXING -
OR

We are in the first subframe

Muxing each 2-bits & Storing in


IDLE -
the pilots memory are done
MUXING
Muxing each 2-bits & Storing in Muxing & Storing in the pilots
MUXING
the pilots memory are not done memory is enabled

Table 10: State Table of NRS Generation FSM

66
4.6.2. NRS Location

4.6.2.1. Block Diagram

Figure 95: Interface Diagram of NRS Index

4.6.2.2. Interface Table

Port Direction Width Description


clk Input 1 bit System clock
rst_n Input 1 bit Block reset
mapper_en Input 1 bit Block enable
n_cell_id Input 9 bits Physical layer cell identity given from upper layer

index1 Output 4 bits Determines the location of a pilot to be in which row, ranged from
0 to 11 where the row refers to the subcarrier

index2 Output 4 bits Determines the location of a pilot to be in which row, ranged from
0 to 11 where the row refers to the subcarrier

index3 Output 4 bits Determines the location of a pilot to be in which row, ranged from
0 to 11 where the row refers to the subcarrier
index4 Output 4 bits Determines the location of a pilot to be in which row, ranged from
0 to 11 where the row refers to the subcarrier
Table 11: Interface Table of NRS Index

4.6.2.3. Operation
 Since the algorism of the code that generate the index contains complex operations like multiplication
and mod of constant value, so we replace them with the equivalent values.
 First: generate the mod operation for all the expected values of the Ncell and save it in a memory
(we will have 6 output values as we make mod6)
 Second: generate the values of the index0, index1, index2 and index3 from their equations in
MATLAB so each output of the mode operation has its own 4 index values.

 So when the Ncell enter the design we will find the mode output from a memory inside, then this
mode value have the 4 values of the index which is the output of the design.

67
4.6.3. NPSS

4.6.3.1. Block Diagram

Figure 96: Interface Diagram of NPSS

4.6.3.2. Interface Table

Port Direction Width Description


clk Input 1 bit System clock
rst_n Input 1 bit Block reset
npss_en Input 1 bit Block enable comes from element mapper module
i_npss Output 16 bits Narrowband Primary Synchronization signal input in-phase
symbol from the NPSS module

q_npss Output 16 bits Narrowband Primary Synchronization signal input quadrature-


phase symbol from the NPSS module

npss_valid Output 1 bit Valid signal with output NPSS symbol


Table 12: Interface Table of NPSS

4.6.3.3. Operation

 The idea is that the equation contains exponential operation, so it is calculated using MATLAB
and then replaced with a ROM in RTL containing the values.

 When the enable signal is given to the block, it starts to read all the values in the ROM at every
two clock cycles to enter the resource element mapper with the same rate like modulator

68
4.6.4. Element Mapper
4.6.4.1. Interface Table
Port Type Width Description
clk Input 1 bit System clock
rst_n Input 1 bit Block reset
mapper_en Input 1 bit Block enable
Input 1 bit Valid signal with output symbol sent by modulator
mod_valid
Input 16 bits
mod_i In-phase input symbol from the modulator
Input 16 bits Quadrature-phase input symbol from the modulator
mod_q
nrs_i_out1 Input 16 bits In-phase input symbol from the NRS generation
nrs_q_out1 Input 16 bits Quadrature-phase input symbol from the NRS generation
nrs_i_out2 Input 16 bits In-phase input symbol of the narrow band reference signal
nrs_q_out2 Input 16 bits Quadrature-phase input symbol from the NRS generation
nrs_i_out3 Input 16 bits In-phase input symbol from the NRS generation
nrs_q_out3 Input 16 bits Quadrature-phase input symbol from the NRS generation
nrs_i_out4 Input 16 bits In-phase input symbol from the NRS generation
nrs_q_out4 Input 16 bits Quadrature-phase input symbol from the NRS generation
index1 Input 4 bits Determines the location of a pilot to be in which row, ranged from 0 to 11 where the row refers to
the subcarrier

index2 Input 4 bits Determines the location of a pilot to be in which row, ranged from 0 to 11 where the row refers to
the subcarrier

index3 Input 4 bits Determines the location of a pilot to be in which row, ranged from 0 to 11 where the row refers to
the subcarrier

index4 Input 4 bits Determines the location of a pilot to be in which row, ranged from 0 to 11 where the row refers to
the subcarrier
i_npss Input 16 bits Narrowband Primary Synchronization signal input in-phase symbol from the NPSS module

q_npss Input 16 bits Narrowband Primary Synchronization signal input quadrature-phase symbol from the NPSS
module

ifft_valid Input 1 bit Valid signal sent from the IFFT block
Input 1 bit Sent from IFFT to indicates it is ready to receive the next OFDM symbol
ifft_ready

rm_ctrl_en Input 1 bit Control signal sent from the system’s FSM to make the block send rm_ctrl signal to rate matcher

rm_data
Input 1 bit Control signal sent from the system’s FSM to resource element mapper to make the rate matcher
out_en start sending data

69
rem_stop Input 1 bit Control signal sent from the system’s FSM when the whole operation is done to stop the IFFT
block

rm_ctrl Output 1 bit Control signal sent to rate matcher to send 20 or 24 bits
rm_data_out Output 1 bit Control signal sent to rate matcher and FSM to make rate matcher start sending data

i_out Output 16 bits In-phase output QPSK symbol sent to IFFT

q_out Output 16 bits Quadrature-phase output QPSK symbol sent to IFFT

hlt_npss Output 1 bit Signal sent to FSM to stop sending bits until NPSS is done

npss_en Output 1 bit Signal sent to enable the NPSS module

toggle_nrs Output 1 bit Signal sent to NRS generation module to decide which part of the pilots memory will be read
depending on the slot (first or last 4 locations)

Subframe Output 1 bit Signal sent to NRS generation indicating that a sub-frame is done to generate the new pilot
symbols for the next subframe
rem_done

Ifft_enable Output 1 bit Signal sent to enable the IFFT one cycle before sending the symbols
load

ifft_stop Output 1 bit Sent to ifft when the rem_stop signal reaches from the FSM
Table 13: Interface Table of Element Mapper

4.6.4.2. Operation

The element mapper module is considered the brain of the resource allocation in NB-IoT LTE. It
takes the responsibility of the actual mapping of data or pilots or synchronization, also it has
status/control signals from/to the system’s finite state machine, rate matcher, and IFFT.
It is mainly composed of memories and different counters to help in the mapping process and
here is a brief description of some of these counters
 Counting Sub-carriers (Frequency domain) :

 It counts the rows of the NB-IoT physical resource block (PRB).


 Each two clock cycles, a QPSK symbol is stored inside a memory and the counter resets
automatically.
 In case of NRS, the counter counts are reduced by 2.

 Counting OFDM Symbols (Time domain) :

 It counts the columns of the NB-IoT physical resource block (PRB).


 It counts the 7 OFDM symbols whether these symbols represents data symbols from the
modulator (first 5 OFDM symbols) or pilot symbols from NRS generation & Index modules (last
2 OFDM symbols) or synchronization symbols from NPSS.

70
 When one slot is done, toggle a bit is sent to NRS generation module to select the pilot symbols
for this slot.

 Slot Counter :

 It is needed to know the slot number which will be used to map the NPSS symbols in their
locations and also to send a signal indicates that the subframe is done for NRS generation module
to exit from the state of “HLT_MUXING” and store the new pilots.

 Output Counter :

 When the IFFT block sends “ifft_ready” signal, the “ifft_enable_load” will be sent to IFFT and is
high for one clock cycle, so in order to send the 16 QPSK symbols we need a signal to be high for
16 clock cycles “load_ifft”, this counter counts these 16 output cycles.
 For the first QPSK symbol, the IFFT doesn’t send “ifft_ready”, so when the first symbol is ready
it will be sent directly and the signal “first_symbol_ready” is high when the first column is done.

 NPSS Counter and others ..

71
4.6.5. Resource Element Mapper TOP LEVEL

4.6.5.1. Block Interface

Figure 97: Block Interface of Resource Element Mapper

4.6.5.2. Architecture

Figure 98: Top Level of Resource Element Mapper

72
4.6.5.3. Interface Table

Port Direction Width Description


clk Input 1 bit System clock

rst_n Input 1 bit Block reset

nrs_enable Input 1 bit NRS Generation module enable enable

mapper_en Input 1 bit Block enable (for NRS index & Element mapper) modules

n_cell_id Input 9 bits Physical layer cell identity given from upper layer

mod_valid Input 1 bit Valid from modulator block accompanied with the sent symbol

rm_ctrl_en Input 1 bit From FSM to make the block send rm_ctrl signal to rate matcher

rm_data_out_en Input 1 bit From FSM to make the block send rm_data_out signal to rate matcher
mod_i Input 16 bits In-phase input QPSK symbol from modulator

mod_q Input 16 bits Quadrature-phase input QPSK symbol from modulator

ifft_ready Input 1 bit Ready signal from IFFT (ready to receive a symbol)

ifft_valid Input 1 bit Valid signal sent from the IFFT block

rem_stop Input 1 bit Control signal sent from the system’s FSM when the whole operation is
done to stop the IFFT block

i_out Output 16 bits In-phase output QPSK symbol to IFFT

q_out Output 16 bits Quadrature-phase output QPSK symbol to IFFT


hlt_npss Output 1 bit Signal sent to FSM to stop sending bits until NPSS is done

rm_ctrl Output 2 bits Control signal sent to rate matcher to know whether sent 20 or 24 bits

rm_data_out Output 1 bit To rate matcher and FSM to make rate matcher start sending data

ifft_enable_load Output 1 bit Signal sent to the FSM to enable the IFFT to be ready to receive them

ifft_stop Output 1 bit Sent to ifft when the rem_stop signal reaches from the FSM
Table 14: Interface Table of REM top level

73
4.6.5.4. Complete Frame Scenario

 At first, the incoming QPSK symbols from the modulator are stored in a memory element, since each
symbol is formed after 2 clock cycles, therefore the OFDM symbol takes 24 clock cycles to be stored.

 After one cycle, the OFDM symbol is stored in another memory waiting for the “ifft_ready” to send
the In-phase & Quadrature symbols. This operation lasts for the first 5 OFDM symbols in the first
slot.

 The last 2 OFDM symbols includes NRS symbols (pilots) and only 10 QPSK symbols from
modulator to complete these OFDM symbols.

 The previous operation lasts for the first 10 slots (from 0 to 9).

 In slots 10 & 11, NPSS symbols are mapped with no pilots.

 In slots 12 to 19, the same mapping of QPSK symbols & NRS symbols takes place.

 From the system’s FSM, this block uses three status signals as an input :

 rm_ctrl_en : the resource element mapper sends the number of bits configuration “rm_ctrl”
(24 or 20 bits) to the Rate Matcher for one clock cycle.
 rm_data_out_en : the signal “rm_data_out” is sent to the Rate Matcher in one of the
following cases :
a) When “rm_data_out_en” is high and “ifft_ready” is high

b) When “rm_data_out_en” is high and (we’re mapping the first two OFDM symbols)

 rem_stop : to stop the IFFT block when the whole operation is done

 To the system’s FSM, this block sends three control signals as an output :
 hlt_npss : to stop sending data from modulator until NPSS is mapped.

 ifft_enable_load : to enable the IFFT to be ready to receive the outcoming symbols from
REM.
 rm_data_out : to make rate matcher start sending data bits according the bit-configuration.

 Block Latency : 1625 clock cycles

74
4.6.6. Simulation results of NRS Generation

First, we compared our function to the MATLAB built-in function and it results in identical pilots values
at cell id = 300 & sub-frame number 3.

Figure 99: MATLAB comparison of NRS values at cell_id=300 & subframe=3

After that, the RTL was tested using the same parameters (cell id = 300 & sub-frame number 3) and it is
clear that at slot 6 (subframe 3) the pilots sign is identical to that generated from Matlab.

Figure 100: RTL results of NRS values at cell_id=300 & subframe=3

75
Finally, a whole frame (10 sub-frames) is tested to check its pilots values, we used our MATLAB
function to verify our RTL using automated test-bench.

Figure 101: Testing a wholre frame pilots using automated testbench

4.6.7. Simulation results of NRS Index

We compared the MATLAB function with the generated values from RTL at the same cell ID and it is
noticeable that the indices are identical

Figure 102: RTL results of NRS indices at cell_id=300

Figure 103: Matlab results of NRS indices at cell_id=300

76
4.6.8. Simulation results of the TOP module

In the figure below, we can notice that the output QPSK data symbols from the modulator are mapped
and delivered to the IFFT when it requests that symbol

Figure 104: Output of the REM is the QPSK symbols requested by the IFFT

In the figure below, a whole frame (20x7x12 = 1680 QPSK symbols) is tested using automated test-bench
and all test cases passed which includes (Data, Pilots, and Synchronization)

Figure 105: Test cases for a complete frame using automated testbench

77
4.6.9. Synthesis Results

Figure 106: Timing_max (setup time) report of REM

Figure 107: Area report of REM

Figure 108: Power report of REM

78
4.7. IFFT & CP

IFFT is an algorithm used to compute IDFT to convert signals from frequency domain to time domain.
IDFT is a useful operation but computing it directly from its definition is not a practical method, so we go
to IFFT which rapidly computes the operation by breaking IDFT into smaller IDFTs (dived and conquer
algorithm). Which decreases the complexity of computing from O(N2) of IDFT to O(N log2N) of IFFT
where N is the data size.
1
IFFT equation 𝑋𝑘 = ∑𝑁−1
𝑛=0 𝑥𝑛 𝑒
𝑗2𝜋𝑘𝑛/𝑁
𝑤ℎ𝑒𝑟𝑒 𝑘 = 0, … , 𝑁 − 1,
𝑁

The most common algorithm of IFFT is the Cooley–Tukey algorithm, where radix-2 is the simplest form
of the algorithm which is used in this thesis, there are two forms in radix-2 which are decimation in time
(DIT) and decimation in frequency(DIF). For DIT the input is bit reversed while the output is a natural
order. While DIF the input is in natural order while the output is bit reversed order, and the DIF butterfly
is slightly different from DIT where in DIF the complex multiplication takes place after the add-subtract
operation. But both requires N log2N operation to compute the DFT. Both algorithms can done in-place
and both need to perform bit reversal at some place during the computation, the below figures show the
two forms, in our thesis we choose DIT algorithm.

Figure 109: Decimation in Frequency

79
Figure 110: Decimation in Time

The most common architecture to implement the IFFT is Memory-based architecture and pipeline
architecture we will take single-delay feedback (SDF) as an example of the pipeline architecture, here is a
comparison between them.

Point of Comparison Memory-based SDF

Complex multipliers 1 3

Complex Adders 2 8

RAM Size (Depth) 16 15

latency(cycles) 34 16

Table 15: Comparison between Memory-based and SDF architectures

This comparison based on 16-point IFFT radix-2.

80
Memory-based architecture
In NB-IOT the target devices are intended to operate for a long time, so we want to implement a design
has a long-life battery with low power consumption, so we choose Memory-based architecture to be
implemented due its small area which will reflect in power consumption unlike the single-delay feedback
architecture which consume more power.
This architecture consists of four single port RAMs, seven multiplexers, two adders, one multiplier, one
ROM, and one controller. The four single-port RAMs are used for buffering the computational data. The
multiplexers are responsible for switching the data flow between the storage and arithmetic components.
The adder and multiplier execute the computation of the two-point IFFT.
The ROM stores the twiddle factors. The addend and augend of the left adder can be changed by
controlling the Ch signal. For example, if Ch=0, then the adder executes. However, if Ch=1, then the
adder executes. The controller, which is not shown in Figure 111, generates the controlling signals for the
multiplexers and the four RAMs, eventually, the output is shift right by 16 as the IFFT equation stated.

Figure 111: Memory-based Architecture

81
The signal flow graph explains the operation of the proposed memory-based FFT processor as shown in
Figure 112.
At first 8 cycles the first 8 frequency domain signals are stored in memory banks A & C, then in each
cycle a frequency domain signal is added and subtracted from the corresponding signal which is stored
previously in the memory banks, this will lead to get out 2 signals each cycle, so the second 8 cycles will
generate 16 output signal from the first stage where the last 8 signals are multiplied with their
corresponding twiddle factor as shown in the signal flow graph which is follow the DIT butterfly
algorithm. The process is going on with divide and conquer algorithm so in the second stage the upper 8
signals operated separately from the low 8 signals and so on until x[0] is got out in the cycle number 34
and the rest of the output will get out continuously until x[15] is got out at cycle number 48.

Figure 112: Signal Flow Graph for 16-point FFT

As there are nearly 16 control signals for {Ch, Ch1, Ch2, Ch3, Ch4, Ch5, Ch6, Address A/B/C/D,
Write_enable A/B/C/D, Address for ROM} each vary in each cycle from the 48 working cycles so we
have nearly 768 different values for the control signals which will not be a good idea to write them value
by value in the FSM of the IFFT.

82
So we write the required value in a text file, then by a PYTHON we write a script to read this text file to
generate an automated Verilog code, so any required modification we just change in the text file and re-
run the PYTHON script to generate the modified Verilog code.

Figure 113: Automation

4.7.1. Block Interface

Figure 114: IFFT & CP Block Diagram

83
4.7.2. Interfaces Table

Port Direction Width Description

CLK Input 1-bit System clock

RST Input 1-bit Asynchronous negative reset

EN Input 1-bit Enable signal to operate the IFFT, it’s connected from the REM
Data_in Input 32-bit It’s concatenation of the I and Q of the input data from the
REM, first 16 bits are the I while the last 16 bits are the Q

Stop Input 1-bit It’s the signal which indicate that there no more data, so the
valid will be zero after extracting the current OFDM symbo

Data_out output 32-bit It’s concatenation of the I and Q of the output data from the
REM, first 16 bits are the I while the last 16 bits are the Q
Valid output 1-bit It’s the signal which indicates that the output data is valid to be
read by the DAC

IFFT_read output 1-bit To read a new data from the REM.


Table 16: Block Interfaces

Architecture

Figure 115: IFFT & CP Architecture

84
4.7.3. Blocks Description

The IFFT block which takes the input data and convert it to time domain with the pre-mentioned
Memory-based architectures which is controlled by a FSM that is embedded in the IFFT block.
The write FSM is responsible for writing the output data from IFFT in the two buffers with an ordered
manner.
The Read FSM is responsible for read the data from the buffer with their pre-arranged manner, and
implicitly it’s the block which add the cyclic prefix as it get the end of the data at the start of the OFDM
symbol and then extract data in sequence.
The Valid FSM is responsible for indicating the output data validity, to be read by the next block which is
the DAC.
The Output buffers are implemented for two reasons, first to reorder the output data as they are got out
with a bit reversal order as we use DIT algorithm (input are ordered while the output is bit reversed),
second reason to add the cyclic prefix as the data will be buffered so we can copy the end of the data to be
get out as a prefix for the OFDM symbol.

4.7.4. Operation

First “EN” signal is triggered to be high then the data is get in from the next cycle of “EN” triggering
until the 16 symbol enter the IFFT block, as mentioned before the first 8 cycles of the operation the IFFT
is only storing the symbols in RAM A and RAM B, then from the 9th cycle the addition, subtraction and
multiplication are applied on the data, till reach the 34th cycle at which the data are get out from the IFFT
sample by sample, at the same time the “get_in” signal is triggered to activate the WRITE FSM to start
write the data in the output buffer1.
Till now the “valid” signal is still zero, and the data is stored in output buffer 1, then new data are coming
after 137 cycles from the first data (we will clarify why 137 cycle below), the “IFFT_read” is the signal
which is responsible for claiming new data, then the new data take the same process as the previous data,
while the stored data are getting out from the upper output buffer by the control of the READ FSM, and
now the valid signal is activated to indicate that the output data is valid to be read.
The data in buffer 1 are read out in 137 cycles as follow, first 9 cycles are cyclic prefix while the next 128
cycles are useful data, the 2nd input data are stored in the output buffer 2 when they are processed by the
IFFT block this storing is done within the 137 cycles of read the data in buffer 1, and the process go on as
described, one buffer is storing while the other is extracting the data, they work alternately.
“buffer_sel” signal plays a vital role in alternate between reading and writing in the output buffers, this
signal is generated from the READ FSM through an internal counter.
Eventually, when the data in the chain are read a “stop” signal will be activated from the system FSM, so
the IFFT extract the last OFDM symbol then the “valid” is deactivated.

85
Figure 116 describes the time line of the expected system output.

Figure 116: IFFT & CP time-line

The reason why we choose 137, is to achieve the cyclic prefix time which is determined in the standard,
and here is a clarification.
Assume ‘x’ is repeated samples of the cyclic prefix, and ‘y’ is repeated samples of the data.
According to the standard, cyclic prefix duration is 4.69 µs while data duration is 66.67 µs
𝑥 4.69
=
𝑥 + 16𝑦 4.69 + 66.67
To obtain these durations, we use the above equation, and by trial and error we get that
x = 9, when y = 8
This means that the total number of cycles to send an OFDM symbol is 9+16*8 = 137 cycle, the first 9
cycles is for cyclic prefix and the last 128 cycles is for data, so we’ll copy the tail of the OFDM symbol to
the beginning of it, as shown in Figure 117.

Figure 117: OFDM Symbol's 137 cycles

86
4.7.5. Clock Determination

As the IFFT is the last block in the chain, so it’s the bottleneck which determine the output rate, so to
achieve the OFDM symbol duration according to the standard which is 71.36µs.
137 ∗ 𝑇𝑐𝑙𝑘 = 71.36𝜇𝑠
𝑓𝑐𝑙𝑘 = 1.92𝑀𝐻𝑧
We operate the whole chain with this frequency and depend on halting some blocks by the system FSM if
these blocks can operate with higher frequency.

4.7.6. Future Work

In this thesis we present the IFFT and CP design which extracts the data and stands for 8 cycles for each
sample, which is not a practical situation especially from the communication wise, so it’s recommended
to implement 128-point IFFT instead of 16-point, by insert the data in 12 symbols to IFFT (as it is the
allocated number of subcarriers according to the standard) and the other symbols are padded with zero,
then copy the last 9 samples of the 128 output samples from the IFFT as a cyclic prefix, it’s the ideal
situation so the design is ready to get out as a real product.

87
4.7.7. Simulation Results
Before presenting the simulation and results, we should first present the verification procedure which
facilitate the way to simulations and get accurate results.
1- We generate random complex numbers using MATLAB and export these numbers in two formats
fixed point format and floating point format, all the generated input are combinations of ±1/sqrt(2)
because our modulation scheme is QPSK, as shown in Figures 118, 119.

Figure 118: Generated Input (fixed)

Figure 119: Generated Input (float)

88
2- We take the generated input which is in fixed format to a Python script so it generates a Verilog code
that can be inserted in the testbench as in Figure 120.

Figure 120: Generated Test-bench fragment

3- We also generate the expected output from the IFFT in a text file in floating point format as shown in
Figure 121, here there is no need to generate the fixed format.

Figure 121: Expected Output (float)

89
4- Then we write a testbench which generates the output data when valid is high in a text file, which for
sure in fixed format as we deal with a digital system, then this file is passed to a MATLAB script to
convert it to a floating format.

5- Eventually, we pass the file which generated from the testbench (generated output file) and the
expected file which is generated from MATLAB (expected output file) and compare them to see the
final results. (We will present the results at the end of this section).

Waveforms

Figure 122 below shows that “EN” is activated then input data are read by the IFFT, at the same time the
2nd buffer is extracting the data (as discussed before).
The “valid” signal is activated with the beginning of the OFDM symbol.
The highlighted samples are the cyclic prefix which is 9 cycles (1 cycle for x(14) and 8 cycles for x(15))
then the data are extracted in arranged sequence from x(0) to x(15).

Figure 122: IFFT output to clarify the CP

Figure 123 below shows that the cyclic prefix duration is 4.69µs as stated in the standard.

Figure 123: CP duration

90
Figure 124 shows the OFDM symbol duration is 71.3 which is expected as we operate the system with
frequency 1.92MHz and extract the output in 137 cycles.

Figure 124: IFFT output to clarify the OFDM duration

Figure 125 shows the “Data_out” is extracted without any bubbles (in continues manner), first 9 cycles
are the cyclic prefix while the last 128 cycles are the useful data, each is extracted in 8 cycles.

Figure 125: IFFT output to clarify the continuity of the output

Figure 126 shows IFFT operation for 20 OFDM symbols.


Also a “stop” signal is activated at the end of the last OFDM symbol, and it’s shown that when the
OFDM symbol completely extracted the valid signal is deactivated.

Figure 126: IFFT output for 20 OFDM symbols

91
Results
And finally the comparison file, as we see the expected output is nearly equals the generated output, they
are not exactly equal each other’s due to two reasons:
1- Quantization error as we only represent the data in 16 bits (for either real part or imaginary part).
2- Truncation of multiplication output which is expected to be 32 bits as we multiply 16 bits with each
other, but we have a limited RAMs with 16 bits width, which force us to truncate the data to be 16 bits.
If we increase the length of the represented data bits we will consume more power as larger multiplier and
adders are needed and also we will need more area as the RAMs will be wider, but our goal from the
beginning is to get the minimum power and area as possible.

Figure 127: Final Comparison File

92
4.7.8. Fraction bits number determination

Real numbers can be represented in digital systems by two representation:


1) Floating-point representation: The floating-point representation breaks the number into two
parts, the left-hand side is a signed, fixed-point number known as a mantissa and the right-hand
side of the number is known as the exponent.

𝑀𝑎𝑛𝑡𝑖𝑠𝑠𝑎 ∗ 2𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡
It’s a complex representation which need a complex units to deal with to get very accurate results, which
will lead to consumption of more power and increasing in area, while our target in NB-IOT is to achieve
minimum power and area with a good performance.

2) Fixed-point representation: Fixed-point numbers having decimal points at the right end of the
number are treated as integers because the fixed-point numbers having decimal points at the left
end of the number are treated as fractions.
We can represent these numbers using:

 Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.


 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.
 2’s complementation representation: range from -(2(k-1)) to (2(k-1)-1), for k bits.
2’s complementation representation is preferred in computer system because of unambiguous property and
easier for arithmetic operations.
In this thesis we work with 2’s complementation representation (fixed-point).

To know how number will represent either the integer bits or the fraction bits, we should now first the
range of numbers which we deal with, which will be constrained by the IFFT output.
We generate random QPSK symbols and apply IFFT operation using MATLAB to determine the ranges
of IFFT output, we get that the number ranges is between -0.65 and +0.65, by putting some margin we will
assume that the range is between -0.7 and +0.7.
1
Since, IFFT equation is 𝑋𝑘 = 𝑁 ∑𝑁−1𝑛=0 𝑥𝑛 𝑒
𝑗2𝜋𝑘𝑛/𝑁
, the pre-mentioned range is determined after divide the
summation by N, so before this division the range was -0.7*N and +0.7*N which is -11.2 and +11.2, hence
this range will appear in the intermediate values in the butterfly unit, so at least we must allocate 5-bit for
integer to fit this range.

93
By taking the range ±0.7, we run a MATLAB script to calculate the error due to quantization, to
determine the fraction bits according to the error.

Figure 128: Quantization error vs Fraction length

To calculate this error, we generate range of numbers between -0.7 and 0.7, then re-represent these
numbers using a 32-bit word with different fraction length as shown in Figure 128.
Then we get the error due to the quantization and get the mean of error at each fraction length, then plot
the above curve.
We noted that the error is decreased exponentially with the fraction bits, and we will get more accurate
result as we increase the number of bits to represent the fraction part, but on the other hand we will
consume more power and more area, while our target in NB-IOT is low power and area, so we assumed
that 11 bits is sufficient to represent the fraction part, which corresponds to a mean error of 0.0001224, so
the total bits to represent our real number are 16 bits in fixed-point representation

94
4.7.9. Synthesis Results

Area report in Figure 129, Report separates the area of combinational and sequential logics, also the
interconnect area (wiring area) which has the biggest share.
Total cell area is also reported which include sequential and combinational logics area.

Figure 129: Area report of IFFT & CP

As shown in Figure 130, the timing of the design is met with too large slack, which indicates that the
design may operate at a higher frequency, but we are constrained to this frequency to achieve standard
specification.

Figure 130: Timing_max (setup time) report of IFFT & CP

95
Switching power is caused by the charging and discharging of the load capacitance in the circuit when the
circuit output switches from high to low and from low to high during the normal operation of CMOS
1
gates, where 𝑝𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 = ∝ 𝐶𝐿 𝑉𝐷𝐷2 𝑓
2

Internal power is power dissipated within the boundary of a cell. During switching, a circuit dissipates
internal power by the charging or discharging of any existing capacitances internal to the cell. Internal
power includes power dissipated by a momentary short circuit between the P and N transistors of a gate,
called short-circuit power.
Leakage power is considered a static power which is defined as the power consumption due to constant
current from VDD to GND in the absence of switching activity.

Figure 131: Power report of IFFT & CP

96
Chapter Five

5. System Integration

In this chapter we will discuss briefly the NB-IoT integrated transmitter results.
We integrated the system in a hierarchical approach, where each 2 successive blocks are
combined in a top level and so on till we reach the state that we had two halves of the chain,
upper layer register file, and the controller.
The final step was to integrate the system as shown in the figure below, which took a lot of time
to achieve the standard specs.

Figure 132: Illustrative Diagram of our integration approach

97
5.1. NB-IoT LTE Transmitter TOP LEVEL

Figure 133: NB-IoT Tx Top Level

5.2. Top Level Interface Table

Port Type Width Description


clk Input 1 bit System clock
rst_n Input 1 bit Active Low Asynchronous Chip reset
start_top Input 1 bit Interface Signal that starts or enables the chip
ram_write_en Input 1 bit Write enable to write the upper layer data into the RAM
ram_data_in Input 32 bits Data written into the RAM from the upper layer
ram_write_addr Input 7 bits Address of data RAM from the upper layer
tbs_mac Input 12 bits Transport Block Size, its maximum value is 2536
rnti_mac Input 16 bits Radio Network Temporary ID given from upper layer.
nf_mac Input 10 bits First system frame number given from upper layer.
ns_mac Input 5 bits First slot number given from upper layer.
n_cell_id_mac Input 9 bits Physical layer cell identity given from upper layer.
data_out Output 32 bits Output I & Q symbols from the Transmitter chain
data_out_valid Output 1 bit Valid output, asserted high with the output symbols
Table 17: Top Level Interface Table

98
5.3. Detailed NB-IoT LTE Transmitter Chain

Figure 134: Detailed Architecture

99
5.4. System’s Finite State Machine

Figure 135: Datapath & Controller Diagram

Figure 136: Generated & Expected final outputFigure


137: Datapath & Controller Diagram

100
5.5. Simulation Results

First, we integrated the MATLAB functions of the whole chain by generating an input stream
inside the RAM which then interfaces with the P/S to propagate through the whole chain.
The generated output from the RTL is verified using the integrated MATLAB expected output they
are almost identical.

Figure 138: Generated & Expected final output

101
Also, we make sure that the NB-synch signals are mapped successfully in the integration step,
where the NPSS occupies 11 OFDM symbols and the rest are assumed to be zeros.

Figure 139: NPSS in final comparison file

102
 When the system starts “start_top = 1”, the data propagates through the P/S then CRC
then encoder and the following waveform shows that the rate matcher has received its 3
input streams successfully

Figure 140: Integration of the first half of the chain

 When the Resource Element Mapper sends a “stop” signal to the IFFT indicating that it is
the last symbol to be sent, the IFFT receives that symbol and de-asserts the “data_valid”
signal. Now the Resource Element Mapper can de-assert the “stop” signal after this
successful handshaking.

Figure 141: Handshaking between Resource Element Mapper and IFFT

103
5.6. System Latency

 Calculated :

Latency = 1 (RAM) + 1 (P/S) + [TBS] (CRC) + 1 (RAM) + 1 (P/S) + 1 (Encoder) + [TBS+25] (Rate
Matcher) + 1 (Scrambler) + 1 (Modulator) + 24 (REM) + 137 (IFFT)
2113∗520
For TBS = 960, Latency = = 𝟏. 𝟎𝟗𝟖𝟕𝟔 𝒎𝒔
106

 From Waveform :
The latency is 1.11098 ms (difference between “data_out_ifft” assertion and de-assertion)
Also the below figure shows that we achieved the specified time by the standard for the
NPDSCH for one frame

Figure 142: The frame time specified by the 3GPP

104
5.7. Synthesis Results

5.7.1. Whole Chain ASIC Synthesis Results

Synthesis is the process that translate and map RTL code written in HDL into a technology specific gate-
level netlist, optimized for a set of pre-defined constraints.
You start with: behavioral RTL design, standard cell library & set of design constraints.
You finish with: gate-level netlist, mapped to the standard cell library Timing, Area and Power reports.
In this thesis we work with TSMC 130 nm technology, we operate at 1.92MHz, such factors are critical in
determination of area, power and speed of the system.

Figure 143: Total Power for the integrated system

105
Figure 144: Area report for the integrated system

Figure 145: Timing_max (setup) report for the integrated system

106
5.7.2. Summary for Power and Area results

Figure 146: Power Results for all blocks

Figure 147: Area Results for all blocks

107
5.8. Full System Vivado Utilization for Vertex-7

Figure 148: : Vivado Utilization for Vertex-7 for the integrated chain

108
Chapter Six

6. FPGA Implementation
The whole chain is now implemented on the Vertex-7 FPGA and we interface with the kit through the
differential clock pins as shown in Figure 149.

Figure 149: FPGA Architecture

The IBFGDS is a block used to convert differential pair clock signal into a single clock signal.
The virtual input/output (VIO) core is a customizable core “that both monitor and drive internal FPGA
signals in real time.
The integrated logic analyzer (ILA) feature allows you to perform in-system debugging of post-
implemented design on an FPGA device you can use it to monitor signals in design.
We modify the system so it has an embedded ROM stores all the data instead of write them in a
testbench, and only a “start” signal is triggered so the system starts to read the data from the ROM to pass
throw the chain until get out from the IFFT to be captured by the ILA ip.

109
To run the system we reset, enter the upper layer parameters then trigger the “start” signal, all this signals
are triggered through the VIO in the GUI in Vivado as shown in Figure 150.

Figure 150: VIO

Then the ILA capture the data, and draw a wave form as shown in Figure 151.

Figure 151: ILA waveform (at the beginning)

Then when the chain finishes processing on data, the valid signal is deactivated as expected, see Figure
152.

Figure 152: ILA waveform (at the end)

110
Conclusion

In this thesis we introduced the design and implementation of the physical layer of downlink
shared channel for the narrow-band IoT LTE Transmitter chain. The full transmitter chain was
introduced then we started with some theory concepts then the 3GPP Release-14 in which our
design meets this standard, after that we walked through the MATLAB modeling for the
transmitter’s blocks and RTL implementation which is verified using the MATLAB’s built-in
functions. Then we integrated the whole system including the RAM which stores the incoming
transport block size, also the register file which contains the upper layer parameters. Finally we
synthesized the blocks and the integrated system on ASIC DC compiler using tsmc 130 nm
technology and reported slacks, area, and power, then we implemented the system of Vertex-7
FPGA.

111
References

[1] Evolution of NB-IoT and its implementation, simnovus.

[2] core.ac.uk/download/pdf/154281786.pdf

[3] 3GPP Technical Specifications 36.211, “Physical channel and modulation.

[4] ieeexplore.ieee.org/abstract/document/231911
[5] 3GPP Technical Specifications 36.212, “Physical channel and modulation.

[6] 3GPP Technical Specifications 36.213, “Physical channel and modulation.

[7] www.semanticscholar.org/paper/Performance-of-a-fixed-delay-decoding-scheme-for-Sung-
Kim/f4892d662af80524d7649c20cec42c0975c21cf1
[8] Design of Cost-Efficient Memory-Based FFT Processors Using Single-Port Memories Yao-
Xian Yang, Jin-Fu Li, Hsiang-Ning Liu, and Chin-Long Wey Department of Electrical
Engineering National Central University Jhongli, Taiwan, 320
[9] ieeexplore.ieee.org/document/8306882
[10] International Journal of Computer Applications (0975 – 8887) Volume 116 – No. 7, April
2015
[11] tutorialspoint.com/what-is-fixed-point-representation-in-computer-architecture
[12] ieeexplore.ieee.org/document/9194757
[13] tutorialspoint.com/what-is-floating-point-representation-in-computer-architecture
[14]
rfmw.em.keysight.com/wireless/helpfiles/89600b/webhelp/subsystems/lte/content/lte_overview
[15] ieeexplore.ieee.org/document/5231183
[16] 3rd Generation Partnership Project; Technical Specification Group Radio Access Network;
Evolved Universal Terrestrial Radio Access (E-UTRA); Physical channels and modulation
(Release 14) V14.7.0 (2018-06)
[17] 3GPP TS 36.201 V14.1.0 (2017-03)
[18] onelab-eg.com/wp-content/uploads/2020/06/PDF.pdf

112

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy