0% found this document useful (0 votes)
72 views64 pages

Dup2 - For Merge

VLSI involves integrating thousands of transistor-based circuits onto a single microchip. It began in the 1970s with the development of complex semiconductor and communication technologies. VLSI devices include microprocessors. Terms like VLSI are now outdated as chips contain hundreds of millions of transistors. VLSI provides advantages like smaller physical size, lower power consumption, and reduced costs for electronic systems. Applications of VLSI include devices in cars, digital electronics, ATMs, computers, and medical equipment.

Uploaded by

Haritha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views64 pages

Dup2 - For Merge

VLSI involves integrating thousands of transistor-based circuits onto a single microchip. It began in the 1970s with the development of complex semiconductor and communication technologies. VLSI devices include microprocessors. Terms like VLSI are now outdated as chips contain hundreds of millions of transistors. VLSI provides advantages like smaller physical size, lower power consumption, and reduced costs for electronic systems. Applications of VLSI include devices in cars, digital electronics, ATMs, computers, and medical equipment.

Uploaded by

Haritha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

INTRODUCTION TO VLSI

Very-large-scale integration (VLSI) is the process of creating integrated circuits by


combining thousands of transistor-based circuits into a single chip. VLSI began in the 1970s
when complex semiconductor and communication technologies were being developed. The
microprocessor is a VLSI device. The term is no longer as common as it once was, as chips
have increased in complexity into the hundreds of millions of transistors.
1.1 Overview:
The first semiconductor chips held one transistor each. Subsequent advances added
more and more transistors, and, as a consequence, more individual functions or systems were
integrated over time. The first integrated circuits held only a few devices, perhaps as many as
ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more
logic gates on a single device. Now known retrospectively as "small-scale integration" (SSI),
improvements in technique led to devices with hundreds of logic gates, known as large-scale
integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has
moved far past this mark and today's microprocessors have many millions of gates and
hundreds of millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the
huge number of gates and transistors available on common devices has rendered such fine
distinctions moot.
Terms suggesting greater than VLSI levels of integration are no longer in widespread
use. Even VLSI is now somewhat quaint, given the common assumption that all
microprocessors are VLSI or better.
As of early 2008, billion-transistor processors are commercially available, an example of
which is Intel's Montecito Itanium chip. This is expected to become more commonplace as
semiconductor fabrication moves from the current generation of 65 nm processes to the next
45 nm generations (while experiencing new challenges such as increased variation across
process corners). Another notable example is NVIDIA’s 280 series GPU.

1
This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable
of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor count
is largely due to the 24MB L3 cache). Current designs, as opposed to the earliest devices, use
extensive design automation and automated logic synthesis to lay out the transistors, enabling
higher levels of complexity in the resulting logic functionality. Certain high-performance
logic blocks like the SRAM cell, however, are still designed by hand to ensure the highest
efficiency (sometimes by bending or breaking established design rules to obtain the last bit of
performance by trading stability).

1.2 What is VLSI?


● VLSI stands for "Very Large Scale Integration". This is the field which involves packing
more and more logic devices into smaller and smaller areas.
● Simply we say Integrated circuit is many transistors on one chip.
● Design/manufacturing of extremely small, complex circuitry using modified semiconductor
material
● Integrated circuit (IC) may contain millions of transistors, each a few mm in size applications
wide ranging: most electronic logic devices

VLSI and systems


These advantages of integrated circuits translate into advantages at the system level:
Smaller physical size: Smallness is often an advantage in itself-consider portable televisions
or handheld cellular telephones.
Lower power consumption: Replacing a handful of standard parts with a single chip
reduces total power consumption. Reducing power consumption has a ripple effect on the rest
of the system: a smaller, cheaper power supply can be used; since less power consumption
means less heat, a fan may no longer be necessary; a simpler cabinet with less shielding for
electromagnetic shielding may be feasible, too.
Reduced cost: Reducing the number of components, the power supply requirements, cabinet
costs, and so on, will inevitably reduce system cost.

2
1.3 Applications
● Electronic system in cars.
● Digital electronics control VCRs
● Transaction processing system, ATM
● Personal computers and Workstations
● Medical electronic systems.

3
LITERATURE SURVEY
2.1 Introduction
There are several techniques for generating check bits that can be added to a
message. Perhaps the simplest is to append a single bit, called the “parity bit,” which makes
the total number of 1-bits in the code vector (message with parity bit appended) even (or
odd). If a single bit gets altered in transmission, this will change the parity from even to odd
(or the reverse). The sender generates the parity bit by simply summing the message bits
modulo 2 that is, by XORing them together. It then appends the parity bit (or its complement)
to the message. The receiver can check the message by summing all message bits modulo 2
and checking that the sum agrees with the parity bit. Equivalently, the receiver can sum all
the bits (message and parity) and check that the result is 0 (if even parity is being used).
This simple parity technique is often said to detect 1-bit errors. Actually it detects
errors in any odd number of bits (including the parity bit), but it is a small comfort to know
you are detecting 3-bit errors if you are missing 2-bit errors. For bit serial sending and
receiving, the hardware to generate and check a single parity bit is very simple. It consists of
a single XOR gate together with some control circuitry. For bit parallel transmission, an XOR
tree may be used, as illustrated in Figure 2.1.
Figure 2.1 shows XOR tree

4
Figure 2.1: XOR tree

Other techniques for computing a checksum are to form the XOR of all the bytes in the
message, or to compute a sum with end-around carry of all the bytes. In the latter method the
carry from each 8-bit sum is added into the least significant bit of the accumulator. It is
believed that this is more likely to detect errors than the simple XOR, or the sum of the bytes
with carry discarded. A technique that is believed to be quite good in terms of error detection,
and which is easy to implement in hardware, is the cyclic redundancy check. This is another
way to compute a checksum, usually eight, 16, or 32 bits in length,that is appended to the
message. We will briefly review the theory and then give some algorithms for computing in
software a commonly used 32-bit CRC checksum.

2.2 Theory
The CRC is based on polynomial arithmetic, in particular, on computing the
remainder of dividing one polynomial in GF(2) (Galois field with two elements) by another.
It is a little like treating the message as a very large binary number, and computing the
remainder on dividing it by a fairly large prime such as intuitively, one would expect this to
give a reliable checksum.
A polynomial in GF(2) is a polynomial in a single variable x whose coefficients
are 0 or 1. Addition and subtraction are done modulo 2that is, they are both the same as the
XOR operator. For example, the sum of the polynomials x3 + x + 1 and x4 + x3+ x2 + x is
x4+x2+1 as is their difference. These polynomials are not usually written with minus signs,
but they could be, because a coefficient of –1 is equivalent to a coefficient of 1.
Multiplication of such polynomials is straightforward. The product of one coefficient by
another is the same as their combination by the logical AND operator, and the partial
products are summed using XOR. Multiplication is not needed to compute the CRC
checksum. Division of polynomials over GF(2) can be done in much the same way as long
division of polynomials over the integers.
Below is an example.

5
We can verify that the quotient of x4+x3+1 multiplied by the divisor of x3+x+1 plus the
remainder of x2+1 equals the dividend. The CRC method treats the message as a polynomial
in GF(2). For example, the message 11001001, where the order of transmission is from left to
right (110…) is treated as a representation of the polynomial x7+x6+x3+1. The sender and
receiver agree on a certain fixed polynomial called the generator polynomial. For example,
for a 16-bit CRC the CCITT has chosen the polynomial x16+x12+x5+1 which is now widely
used for a 16-bit CRC checksum. To compute an r-bit CRC checksum, the generator
polynomial must be of degree r. The sender appends r 0-bits to the m-bit message and divides
the resulting polynomial of degree m + r – 1 by the generator polynomial. This produces a
remainder polynomial of degree r-1 (or less). The remainder polynomial has r coefficients,
which are the checksum. The quotient polynomial is discarded. The data transmitted (the
code vector) is the original m-bit message followed by the r-bit checksum.
There are two ways for the receiver to assess the correctness of the transmission. It can
compute the checksum from the first m bits of the received data, and verify that it agrees with
the last r received bits. Alternatively, and following usual practice, the receiver can divide all
the m+r received bits by the generator polynomial and check that the r-bit remainder is 0. To
see that the remainder must be 0, let M be the polynomial representation of the message, and
let R be the polynomial representation of the remainder that was computed by the sender.
Then the transmitted data corresponds to the polynomial Mxr-R (or, equivalently, Mxr+ R).
By the way R was computed, we know that Mxr =QG+ R where G is the generator
polynomial and Q is the quotient (that was discarded). Therefore the transmitted data, Mxr-R,
is equal to QG, which is clearly a multiple of G. If the receiver is built as nearly as possible
just like the sender, the receiver will append r 0-bits to the received data as it computes the
remainder R. But the received data with 0-bits appended is still a multiple of G, so the
computed remainder is still 0.

6
That’s the basic idea, but in reality the process is altered slightly to correct for such
deficiencies as the fact that the method as described is insensitive to the number of leading
and trailing 0-bits in the data transmitted. In particular, if a failure occurred that caused the
received data, including the checksum, to be all-0, it would be accepted. Choosing a “good”
generator polynomial is something of an art. Two simple observations: For an r-bit
checksum, G should be of degree r, because otherwise the first bit of the checksum would
always be 0, which wastes a bit of the checksum. Similarly, the last coefficient should be 1
(that is, G should not be divisible by x), because otherwise the last bit of the checksum would
always be 0 (because Mxr =QG+ R, if G is divisible by x, then R must be also).
The following facts about generator polynomials are proved in [PeBr] and/or [Tanen]:
● If G contains two or more terms, all single-bit errors are detected.
● If G is not divisible by x (that is, if the last term is 1), and e is the least positive
integer such that G evenly divides xe+1, then all double errors that are within a
frame of e bits are detected. A particularly good polynomial in this respect is
x15+x14+1 for which e=32767.
● If x+1 is a factor of G, all errors consisting of an odd number of bits are detected.
● An r-bit CRC checksum detects all burst errors of length ≤ r. (A burst error of
length r is a string of r bits in which the first and last are in error, and the
intermediate r-2 bits may or may not be in error.) The generator polynomial x+1
creates a checksum of length 1, which applies even parity to the message.
● It is interesting to note that if a code of any type can detect all double-bit and
single-bit errors, then it can in principle correct single-bit errors. To see this,
suppose data containing a single-bit error is received. Imagine complementing all
the bits, one at a time. In all cases but one, this results in a double-bit error, which
is detected. But when the erroneous bit is complemented, the data is error-free,
which is recognized. In spite of this, the CRC method does not seem to be used for
single-bit error correction. Instead, the sender is requested to repeat the whole
transmission if any error is detected.

2.3 Hardware

7
To develop a hardware circuit for computing the CRC checksum, we reduce the
polynomial division process to its essentials.
The process employs a shift register, which we denote by CRC. This is of length r (the
degree of G) bits, not r+1 as we might expect. When the subtractions (exclusive or’s) are
done, it is not necessary to represent the high-order bit, because the high-order bits of G and
the quantity it is being subtracted from are both 1.
The division process might be described informally as follows:
1. Initialize the CRC register to all 0-bits.
2. Get first/next message bit m.
3. If the high-order bit of CRC is 1, Shift CRC and m together left 1 position, and XOR the
result with the low-order r bits of G
Otherwise, Just shift CRC and m left 1 position. If there are more message bits, go back to get
the next one.
It might seem that the subtraction should be done first, and then the shift. It would be
done that way if the CRC register held the entire generator polynomial, which in bit form is
r+1 bits. Instead, the CRC register holds only the low-order r bits of G, so the shift is done
first, to align things properly.
Below is shown the contents of the CRC register for the generator G =x3+x+1 and the
message M =x7+x6+x5+x2+x. Expressed in binary, G =
1011 and M = 11100110.
1. 000 Initial CRC contents. High-order bit is 0, so just shift in first message bit.
2. 001 High-order bit is 0, so just shift in second message bit, giving:
3. 011 High-order bit is 0 again, so just shift in third message bit, giving:
4. 111 High-order bit is 1, so shift and then XOR with 011, giving:
5. 101 High-order bit is 1, so shift and then XOR with 011, giving:
6. 001 High-order bit is 0, so just shift in fifth message bit, giving:
7. 011 High-order bit is 0, so just shift in sixth message bit, giving:
8. 111 High-order bit is 1, so shift and then XOR with 011, giving:
9. 101 There are no more message bits, so this is the remainder.
These steps can be implemented with the (simplified) circuit shown in Figure 2.2, which is
known as a feedback shift register.

8
Figure 2.2: Polynomial division circuit for G =x3+x+1

The three boxes in the figure represent the three bits of the CRC register. When a
message bit comes in, if the high-order bit (x2 box) is 0, simultaneously the message bit is
shifted into the x0 box, the bit in x0 is shifted to x1, the bit in x1 is shifted to x2, and the bit in x2
is discarded. If the high-order bit of the CRC register is 1, then a 1 is present at the lower
input of each of the two XOR gates. When a message bit comes in, the same shifting takes
place but the three bits that wind up in the CRC register have been XORed with binary 011.
When all the message bits have been processed, the CRC holds M mod G. If the circuit of
Figure 2.2 were used for the CRC calculation, then after processing the message, r (in this
case 3) 0-bits would have to be fed in. Then the CRC register would have the desired
checksum, Mxr mod G. But, there is a way to avoid this step with a simple rearrangement of
the circuit.

Figure 2.3: CRC circuit for G =x3+x+1


Instead of feeding the message in at the right end, feed it in at the left end, r steps
away, as shown in Figure 2.3 This has the effect of premultiplying the input message M by xr.
But premultiplying and postmultiplying are the same for polynomials. Therefore, as each
message bit comes in, the CRC register contents are the remainder for the portion of the
message processed, as if that portion had r 0-bits appended.

9
ERROR CORRECTING CODES
3.1 Introduction
Environmental interference and physical defects in the communication medium can
cause random bit errors during data transmission. Error coding is a method of detecting
and correcting these errors to ensure information is transferred intact from its source
to its destination. Error coding is used for fault tolerant computing in computer
memory, magnetic and optical data storage media, satellite and deep space
communications, network communications, cellular telephone networks, and almost any
other form of digital data communication. Error coding uses mathematical formulas to
encode data bits at the source into longer bit words for transmission. The "code word" can
then be decoded at the destination to retrieve the information. The extra bits in the code
word provide redundancy that, according to the coding scheme used, will allow the
destination to use the decoding process to determine if the communication medium
introduced errors and in some cases correct them so that the data need not be retransmitted.
Different error coding schemes are chosen depending on the types of errors
expected, the communication medium's expected error rate, and whether or not data
retransmission is possible. Faster processors and better communications technology make
more complex coding schemes, with better error detecting and correcting capabilities,
possible for smaller embedded systems, allowing for more robust communications.
However, tradeoffs between bandwidth and coding overhead, coding complexity and
allowable coding delay between transmissions, must be considered for each application.
Even if we know what type of errors can occur, we can’t simple recognize them. We
can do this simply by comparing this copy received with another copy of
intended transmission. In this mechanism the source data block is send twice. The
receiver compares them with the help of a comparator and if those two blocks differ, a
request for re-transmission is made. To achieve forward error correction, three sets of the
same data block are sent and majority decision selects the correct block. These methods
are very inefficient and increase the traffic two or three times. Fortunately there are more
efficient error detection and correction codes.

10
There are two basic strategies for dealing with errors. One way is to include
enough redundant information (extra bits are introduced into the data stream at the
transmitter on a regular and logical basis) along with each block of data sent to enable
the receiver to deduce what the transmitted character must have been. The other way is to
include only enough redundancy to allow the receiver to deduce that error has occurred, but
not which error has occurred and the receiver asks for a retransmission. The former
strategy uses Error-Correcting Codes and latter uses Error-detecting Codes To
understand how errors can be handled, it is necessary to look closely at what error really is.
Normally, a frame consists of m-data bits (i.e., message bits) and r-redundant bits (or check
bits). Let the total number of bits be n (m + r). An n-bit unit containing data and check-bits is
often referred to as an n-bit codeword.
Given any two code-words, say 10010101 and 11010100, it is possible to determine
how many corresponding bits differ, just EXCLUSIVE OR the two code-words, and count
the number of 1’s in the result. The number of bits position in which code words differ is
called the Hamming distance. If two code words are a Hamming distance d-apart, it will
require d single-bit errors to convert one code word to other. The error detecting and
correcting properties depends on its Hamming distance.
To detect d errors, you need a distance (d+1) code because with such a code there is no
way that d-single bit errors can change a valid code word into another valid code word.
Whenever receiver sees an invalid code word, it can tell that a transmission error has
occurred.
Similarly, to correct the errors, you need a distance 2d+1 code because that way
the legal code words are so far apart that even with d changes, the original codeword is still
closer than any other code-word, so it can be uniquely determined.
First, various types of errors have been introduced in Sec. 3.2.2 followed by different
error detecting codes in Sec. 3.2.3. Finally, error correcting codes have been introduced
in Sec. 3.2.4.

11
3.2 Types of errors
These interferences can change the timing and shape of the signal. If the
signal is carrying binary encoded data, such changes can alter the meaning of the
data. These errors can be divided into two types: Single-bit error and Burst error.
Single bit error:
The term single-bit error means that only one bit of given data unit (such as
a byte, character, or data unit) is changed from 1 to 0 or from 0 to 1 as shown in fig

Figure: 3.1 single Bit Error


Single bit errors are least likely type of errors in serial data transmission. To see why,
imagine a sender sends data at 10 Mbps. This means that each bit lasts only for 0.1 μs
(micro-second). For a single bit error to occur noise must have duration of only 0.1 μs
(micro-second), which is very rare. However, a single-bit error can happen if we are
having a parallel data transmission. For example, if 16 wires are used to send all 16 bits of a
word at the same time and one of the wires is noisy, one bit is corrupted in each word
Burst errors

12
Figure: 3.2 Burst errors
Burst errors are mostly likely to happen in serial transmission. The duration of the
noise is normally longer than the duration of a single bit, which means that the noise affects
data; it affects a set of bits as shown in Fig. 3.2.2. The number of bits affected depends on the
data rate and duration of noise

3.3 Error Detecting Codes:


Basic approach used for error detection is the use of redundancy, where additional bits are
added to facilitate detection and correction of errors. Popular techniques are:
● Simple Parity check
● Two-dimensional Parity check
● Checksum
● Cyclic redundancy check

3.3.1 SIMPLE PARITY CHECKING OR ONE DIMENSIONAL PARITY


CHECKING
The most common and least expensive mechanism for error- detection is the
simple parity check. In this technique, a redundant bit called parity bit, is appended to
every data unit so that the number of 1s in the unit (including the parity becomes even).
Blocks of data from the source are subjected to a check bit or Parity bit generator form,
where a parity of 1 is added to the block if it contains an odd number of 1’s (ON bits) and 0
is added if it contains an even number of 1’s.

13
At the receiving end the parity bit is computed from the received data bits and
compared with the received parity bit, as shown in Fig. 3.2.3. This scheme makes the total
number of 1’s even, that is why it is called even parity checking. Considering a 4-bit word,
different combinations of the data words and the corresponding code words are given in
Table 3.2.1.

Figure: 3.3 Even-parity checking scheme


Table: 3.1 Possible 4-bit data words and corresponding code words

14
Note that for the sake of simplicity, we are discussing here the even-parity
checking, where the number of 1’s should be an even number. It is also possible to use
odd-parity checking, where the number of 1’s should be odd
Performance:
An observation of the table reveals that to move from one code word to another, at
least two data bits should be changed. Hence these set of code words are said to
have a minimum distance (hamming distance) of 2, which means that a receiver that
has knowledge of the code word set can detect all single bit errors in each code
word. However, if two errors occur in the code word, it becomes another valid member of the
set and the decoder will see only another valid code word and know nothing of the error.
Thus errors in more than one bit cannot be detected. In fact it can be shown that a single
parity check code can detect only odd number of errors in a code word.

3.3.2 TWO DIMENSIONAL PARITY CHECK:

15
Performance can be improved by using two-dimensional parity check, which
organizes the block of bits in the form of a table. Parity check bits are calculated for
each row, which is equivalent to a simple parity check bit. Parity check bits are also
calculated for all columns then both are sent along with the data. At the receiving end
these are compared with the parity bits calculated on the received data. This is illustrated in
Fig.2.4.

Figure 3.4 Two-dimension Parity Checking

Performance
Two- Dimension Parity Checking increases the likelihood of detecting burst errors. As
we have shown in Fig. 3.2.4 that a 2-D Parity check of n bits can detect a burst error of n bits.
A burst error of more than n bits is also detected by 2-D Parity check with a high- probability.
There is, however, one pattern of error that remains elusive. If two bits in one data unit are
damaged and two bits in exactly same position in another data unit are also damaged, the 2-D
Parity check checker will not detect an error. For example, if two data units: 11001100 and
10101100. If first and second from last bits in each of them is changed, making the data units
as 01001110 and 00101110, the error cannot be detected by 2-D Parity check.

3.3.3 CHECK SUM:

16
In checksum error detection scheme, the data is divided into k segments each of m
bits. In the sender’s end the segments are added using 1’s complement arithmetic to get the
sum. The sum is complemented to get the checksum. The checksum segment is sent along
with the data segments as shown in Fig. 3.2.5 (a). At the receiver’s end, all received segments
are added using 1’s complement arithmetic to get the sum. The sum is complemented. If the
result is zero, the received data is accepted; otherwise discarded, as shown in Fig.
3.2.5 (b).

Performance
The checksum detects all errors involving an odd number of bits. It also detects most
errors involving even number of bits

Figure 3.5 (a) Sender’s end for the calculation of the checksum,
(b) Receiving end for checking the checksum

3.3.4 CYCLIC REDUNDANCY CHECK(CRC):

17
This Cyclic Redundancy Check is the most powerful and easy to implement technique.
Unlike checksum scheme, which is based on addition, CRC is based on binary division. In
CRC, a sequence of redundant bits, called cyclic redundancy check bits, are
appended to the end of data unit so that the resulting data unit becomes exactly divisible by a
second, predetermined binary number. At the destination, the incoming data unit is divided by
the same number. If at this step there is no remainder, the data unit is assumed to be correct
and is therefore accepted. A remainder indicates that the data unit has been damaged in
transit and therefore must be rejected. The generalized technique can be explained as
follows.
If a k bit message is to be transmitted, the transmitter generates an r-bit sequence,
known as Frame Check Sequence (FCS) so that the (k+r) bits are actually being
transmitted. Now this r-bit FCS is generated by dividing the original number, appended by r
zeros, by a predetermined number. This number, which is (r+1) bit in length, can also be
considered as the coefficients of a polynomial, called Generator Polynomial.
The remainder of this division process generates the r-bit FCS. On receiving receiving
the packet, the receiver divides the (k+r) bit frame by the same predetermined number and if
it produces no remainder, it can be assumed that no error has occurred during the
transmission. Operations at both the sender and receiver end are shown in Fig.

18
Figure: 3.6 Basic scheme for Cyclic Redundancy Checking

19
This mathematical operation performed is illustrated in Fig. 3.2.7 by dividing a sample
4- bit number by the coefficient of the generator polynomial x3+x+1, which is 1011, using
the modulo-2 arithmetic. Modulo-2 arithmetic is a binary addition process without any carry
over, which is just the Exclusive-OR operation. Consider the case where k=1101. Hence we
have to divide 1101000 (i.e. k appended by 3 zeros) by 1011, which produces the remainder
r=001, so that the bit frame (k+r) =1101001 is actually being transmitted through the
communication channel. At the receiving end, if the received number, i.e., 1101001 is divided
by the same generator polynomial 1011 to get the remainder as 000, it can be assumed that
the data is free of errors.

Figure: 3.7 Cyclic Redundancy Checks (CRC)


The transmitter can generate the CRC by using a feedback shift register circuit. The same
circuit can also be used at the receiving end to check whether any error has occurred. All the
values can be expressed as polynomials of a dummy variable X.
For example, for P = 11001 the corresponding polynomial is X4+X3+1. A polynomial is
selected to have at least the following properties:
● It should not be divisible by X.
● It should not be divisible by (X+1).
The first condition guarantees that all burst errors of a length equal to the degree of
polynomial are detected. The second condition guarantees that all burst errors affecting an
odd number of bits are detected.

20
21
CRC process can be expressed as XnM(X)/P(X) = Q(X) + R(X) / P(X) Commonly used
divisor polynomials are:
• CRC-16 = X16 + X15 + X2 + 1
• CRC-CCITT = X16 + X12 + X5 + 1
• CRC-32 = X32 + X26 + X23 + X22 + X16 + X12 + X11 + X10 + X8 + X7 + X5+ X4 +
X2 + 1
Performance
CRC is a very effective error detection technique. If the divisor is chosen according to the
previously mentioned rules, its performance can be summarized as follows:
• CRC can detect all single-bit errors
• CRC can detect all double-bit errors (three 1’s)
• CRC can detect any odd number of errors (X+1)
• CRC can detect all burst errors of less than the degree of the polynomial.
• CRC detects most of the larger burst errors with a high probability.
• For example CRC-12 detects 99.97% of errors with a length 12 or more.

3.4 ERROR CORRECTING CODES:


The techniques that we have discussed so far can detect errors, but do not correct them.
Error Correction can be handled in two ways
● One is when an error is discovered; the receiver can have the sender retransmit the entire data
unit. This is known as backward error correction.
● In the other, receiver can use an error-correcting code, which automatically corrects
certain errors. This is known as forward error correction.
In theory it is possible to correct any number of error multiple-bit or burst error is so high that
in most of the cases it is inefficient to do so. For this reason, most error correction is limited
to one, two or at the most three-bit errors.

22
CYCLIC REDUNDANCY CHECK
4.1 Introduction
Cyclic redundancy check is widely used in data communication and storage devices
as a powerful method for dealing with data errors. One of the most established hardware
solutions for CRC calculation is the linear feedback shift register and logic gates. This simple
architecture processes the bits serially. In situations, such as high speed data communications
the speed of this serial implementation is obviously inadequate. In these cases, parallel
computation of the CRC is desirable. Parallel CRC calculation can significantly increase the
throughput of CRC computations.
For example, the throughput of the 32-bit parallel calculation of CRC-32 can
achieve several gigabits per second. However, that is still not enough for high speed
application such as Ethernet networks. A possible solution is to process more bits in parallel;
Variants of CRCs are used in applications like CRC-16 BISYNC protocols, CRC32 in
Ethernet frame for error detection, CRC8 in ATM, CRC-CCITT in X-25 protocol, disc
storage, SDLC, and XMODEM.

4.2 Importance of Error Detection


A transmitted bit can be received in error, due to “noise” on the transmission
channel. If we are dealing with voice or video data, the occurrence of errors in a small
percentage of bits is quite tolerable, but in many other cases it is crucial that all bits be
received intact. If for example we are downloading a binary program file, the program may
be unexecutable if even one bit is incorrect. If the destination or source address of a packet
has one bit wrong, communication is impossible. There are many methods which have been
developed to detect errors, applied at different levels of the seven layer model. Of course, no
method can detect all errors, but a number of methods in use today are amazingly effective.
The one we will discuss here is the famous CRC coding form

23
4.3 Introduction to CRC
Some systems go to great lengths to detect data errors. Parity is often used with
parallel forms of data, on buses or memories, to detect some errors. It provides a small
measure of robustness by detecting certain bit errors with minimal redundancy. However,
while parity can detect single-bit errors, it can detect only half of all multiple-bit errors.
Other systems go further, employing Hamming codes to not only detect, but in
many instances correct, bit errors. Both of these approaches are applied to data in its parallel
form. Unfortunately, the use of a Hamming code requires many more bits of redundancy, per
character or word, than parity. For transmission of data on high-speed serial channels, the
most prevalent errors are multi-bit bursts. These multi-bit errors make parity worthless, and
severely limit the effectiveness of single-bit correcting Hamming codes.
The large amount of redundancy in a Hamming code (7 bits to protect a 32-bit
word) also makes it a poor choice to protect data across a serial link. Transmission of the
redundant bits in each word can easily consume a fifth of the available link bandwidth, or
require operation of the link at a 20% faster transfer rate to carry the redundant bits.
In reality, bit errors of any type are quite rare in these links (<< 1 in 1012 bits). Since these
errors cannot generally be corrected by a Hamming code or detected by character parity, the
transmission overhead of these types of detection/correction bits becomes a poor use of link
bandwidth. In systems where data is sent serially across a link, the data integrity of the link
can be much better verified using Cyclic Redundancy Check (CRC) codes.
The CRC algorithm can always be implemented as a software algorithm on a
standard CPU, the software solution will be cheap or free in terms of hardware cost. The
drawback is obviously the computational speed. Linear feedback shift register (LFSR) with
serial data feed has been used to implement the CRC algorithm. Like other hardware
implementations, this method simply performs a division and then the remainder which is the
resulting CRC checksum, is stored in the registers after each clock cycle. Simplicity and low
power dissipation are the main advantages. This method gives much higher throughput than
the software solution but still this cannot fulfill the speed requirements of today’s

24
communication protocol. In order to improve the computational speed of CRC, parallelism
has been introduced, but still this cannot fulfill the speed requirements of today’s
communication protocol.

4.4 CRC Codes:


CRC codes make use of a Linear Feedback Shift Register (LFSR) to generate a
signature based on the contents of any data passed through it. This signature can be used to
detect the modification or corruption of bits in a serial stream. CRC codes have been used for
years to detect data errors on interfaces, and their operation and capabilities are well
understood.
Two codes that have found wide use are CRC–16 and CRC–32. As the names imply,
CRC–16 makes use of a 16-bit LFSR, while CRC–32 uses a 32-bit LFSR.
Cyclic redundancy check (CRC) coding is an error-control coding technique for detecting
errors that occur when a message is transmitted. Unlike block or convolutional codes, CRC
codes do not have a built-in error-correction capability. Instead, when a communications
system detects an error in a received message word, the receiver requests the sender to
retransmit the message word.
In CRC coding, the transmitter applies a rule to each message word to create extra
bits, called the checksum, or syndrome, and then appends the checksum to the message word.
After receiving a transmitted word, the receiver applies the same rule to the received word. If
the resulting checksum is nonzero, an error has occurred, and the transmitter should resend
the message word.

4.5 General CRC Generation


The General CRC Generator block computes a checksum for each input frame,
appends it to the message word, and transmits the result. The General CRC Syndrome
Detector block receives a transmitted word and calculates its checksum. The block has two
outputs. The first is the message word without the transmitted checksum. The second output
is a binary error flag, which is 0 if the checksum computed for the received word is zero, and
1 otherwise. The CRC-N Generator block and CRC-N Syndrome Detector block are special

25
cases of the General CRC Generator block and General CRC Syndrome Detector block,
which use a predefined CRC-N polynomial, where N is the number of bits in the checksum..

Table: 4.1 Modulo-2 Arithmetic

Fig.4.1 shows a long division example. In the example, the divisor is equal to
“11011,” whereas the dividend is equal to “1000111011000.” The long division process
begins by placing the 5 bits of the divisor below the 5 most significant bits of the dividend.
The next step in the long division process is to find how many times the divisor “11011”
“goes” into the 5 most significant bits of the dividend “10001”.

Figure 4.1: Long division using modulo-2 arithmetic


In ordinary arithmetic, 11011 goes zero times into 10001 because the second number is
smaller than the first. In modulo-2 arithmetic, however, the number 11011 goes exactly one
time into 10001. To decide how many times a binary number goes into another in modulo-2
arithmetic, a check is being made on the most significant bits of the two
numbers. If both are equal to “1” and the numbers have the same length, then the first number
goes exactly one time into the second number; otherwise, it is zero times. Next, the divisor
11011 is subtracted from the most significant bits of the dividend 10001 by performing an
XOR logical operation.
The next bit of the dividend, which is “1”, is then marked and appended to the
remainder “1010”. The process is repeated until all the bits of the dividend are marked. The

26
remainder that results from such long division process is often called CRC or CRC
“checksum” (although CRC is not literally a checksum).

4.6 SERIAL CRC


Traditional method for generating serial CRC is based on linear feedback shift
registers (LFSR). The main operation of LFSR for CRC calculations is nothing more than the
binary divisions. Binary divisions generally can be performed by a sequence of shifts and
subtractions.
In modulo 2 arithmetic the addition and subtraction are equivalent to bitwise XORs
and multiplication is equivalent to AND. Figure 4.2 illustrates the basic architecture of
LFSRs for serial CRC calculation.

Figure 4.2: Basic LFSR Architecture


As shown in fig.1.2 d is serial data input, X is present state (generated CRC), X' is next state
and p is generator polynomial . Working of basic LFSR architecture is expressed in terms of
following equations.
X0’ = (P0 ⊗ Xm-1 ) ⊕ d
Xi’= (P0 ⊗ Xm-1 ) ⊕ Xi-1
CRC–16 and CRC–32 :
In general CRC codes are able to detect:
● All single- and double-bit errors.
● All odd numbers of errors.
● All burst errors less than or equal to the degree of the polynomial used.

27
● Most burst errors greater than the degree of the polynomial used.
Figure1.3 shows the circuit for the CRC-16 polynomial

Figure 4.3: Linear Feedback Implementation of CRC–16

The generator polynomial for CRC–16 is listed in Equation 1, and the polynomial for
CRC–32 is listed in Equation 2. These CRC codes are traditionally calculated on the serial
data stream using a Linear Feedback Shift Registers (LFSR) built from flip-flops and XOR
gates, as shown in Figure 1.3. The structure for the CRC–32 polynomial is shown in Figure
4.4
Equation 1: G(x) = x16+x15+x+1
Equation 2: G(x)=x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1
In these equations, the superscripts identify the tap location in the shift register. The order of
the polynomial is identified by the highest order term, and specifies the number of flip-flops
in the shift register. Since these polynomials are for modulo-2.
The generator polynomial for CRC-32 is as follows
G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 +x10 +x8+x7+ x5+ x4 + x2 + x + 1;
We can extract the coefficients of G(x) and represent it in binary form as
P = {p 32, p31, …………, p0}
P = {100000100110000010001110110110111}
Figure1.4 shows the circuit for the CRC-32 polynomial

28
Figure 4.4: CRC circuit for CRC-32.
Frame Check sequence (FCS) will be generated after (k+m) cycle, where k indicates
number of data bit and m indicates the order of generator polynomial. For 32 bits serial CRC
if order of generator polynomial is 32 then serial CRC will be generated after 64 cycles.

4.7 Parallel CRC Computation


When used with high-speed serial data, especially data which is encoded in the
serial domain, it becomes quite difficult to implement the CRC calculation using a shift
register. However, it is possible to convert a serial implementation into a parallel form that
accumulates multiple bits in each clock cycle. To achieve higher throughput, the CRC’s serial
LFSR implementation must be converted into a parallel N-bit-wide circuit, where N is the
design data path width, so that N bits are processed in every clock. This is a parallel CRC
implementation.
Figure 4.5 is a simplified block diagram of the parallel CRC.

29
Figure 4.5: Parallel CRC structure

This is a parallel CRC block. The next state CRC output is a function of the current state
CRC and the data.
There are different techniques for parallel CRC generation given as follow.
● A Table-Based Algorithm for Pipelined CRC Calculation.
● Fast CRC Update
● F matrix based parallel CRC generation.
● Unfolding, Retiming and pipelining Algorithm

The following paragraphs and tables describe how the CRC–16 polynomial is converted to
calculate eight bits at a time (i.e., a byte basis). The CRC–32 polynomial is converted using a
similar procedure, with the results calculated 16 bits at a time (on a half-word basis). The
results for CRC–32 are presented in Table 5 and Table 6, but without the intermediate
calculations.

4.8 Implementation
1. Ri is the ith bit of the CRC register.
2. Ci is the contents of ith bit of the initial CRC register, before any shifts have taken place.
3. R1 is the least significant bit (LSB).
4. The entries under each CRC register bit indicate the values to be XORed together to
generate the content of that bit in the CRC register.
5. Di is the data input, with LSB input first.
6. D8 is the MSB of the input byte, and D1 is the LSB.
7. A substitution is made to reduce the table size, such that Xi = Di XOR Ci.
The results of the CRC are calculated one bit at a time and the resulting equations for
each bit are examined. The CRC register prior to any shifts is shown in Table 4.2. The CRC
register after a single bit shift is shown in Table 4.3. The CRC register after two shifts is
shown in Table 4.4.

30
This process continues until eight shifts have occurred. Table 4 lists the CRC register
contents after eight shifts. Xi was substituted for the various Di XOR Ci combinations. The
following properties were used to simplify the equations:
1.Commutative property (A XOR B = B XOR A).
2.Associative property (A XOR B XOR C = A XOR C XOR B).
3. Involution property (A XOR A = 0).
A study of Table 1.5 reveals two interesting facts:
The most-significant byte (bits R16–R9) of the CRC register is only dependent on
XOR combinations of the initial low-order byte of the CRC register and the input byte.
The least-significant byte (bits R8–R1) of the CRC register is dependent on the XOR
combination of the initial lower eight bits of the CRC register, the input data byte, and the
initial contents of the high-order bits of the CRC register.
This allows the next value of the CRC register to be calculated as an XOR of the input
data character bits, and a constant determined by the present contents of the CRC register. For
example, calculating a new value for R9 is accomplished by calculating X3 and X2 and
exclusive-ORing them together.

Table 4.2: CRC–16 Register prior to any shifts

Table 4.3: CRC–16 Register after One Shift

31
Table 4.4: CRC–16 Register after Two Shifts

Table 4.5: CRC–16 Register after Eight Shifts


The parallel algorithm for CRC–32 is derived in the same manner as CRC–16.

Table 4.6: CRC–32 Register (LSW) after 16 Shifts with Xi Substitution

Table 4.7: CRC–32 Register (MSW) after 16 Shifts

32
The differences here are that data is now handled 16 bits at a time, the CRC register
is now 32 bits in length, and a different polynomial is used. Table 4.6 contains the XOR
information for the LSHW of the CRC–32 register after 16 shifts, and Table 4.7 contains the
XOR information for the MSHW of the CRC–32 register after 16 shifts. Again, note that the
MSHW only depends on XOR combinations of the initial lower-order bits of the CRC–32
register and the input data. The LSHW depends on XOR combinations of the initial
lower-order bits of the CRC–32 register, the input data, and the initial MSHW of the CRC–32
register.

PARALLEL ARCHITECTURE

There are different techniques for parallel CRC generation given as follow.
1. A Table-Based Algorithm for Pipelined CRC Calculation.
2. Fast CRC Update
3. F matrix based parallel CRC generation.
4. Unfolding, Retiming and pipelining Algorithm.

5.1 A Table-Based Algorithm for Pipelined CRC Calculation


LUT base architecture provides lower memory LUT and by the high pipelining
Table base architecture has input, LUT3, LUT2, and LUT1. LUT3 contains CRC values for
the input followed by 12 bytes of zeros, LUT2 8 bytes, and LUT4 4 bytes. Basically this
algorithm it can be obtain higher throughput. The main problem it with pre-calculating CRC
and store it in LUT so, every time required to change LUT when changing the polynomial.
Pipelining algorithm used to reducing critical path by element adding the delay. Parallel
processing used to increasing the throughput by producing the no. of output same time.
Retiming used to increasing clock rate of circuit by reducing the computation time of critical
path.

33
Figure 5.1: LUT
based architecture

5.2 Fast CRC


Update
In fast
CRC update technique not required to calculate CRC each time for all the data bits, instead of
that calculating CRC for only those bits that are change.
There are different approaches to generate the parallel CRC having advantages and
disadvantages for each technique. Table based architecture required pre-calculated LUT, so, it
will not used for generalized CRC, fast CRC update technique required buffer to store the old
CRC and data.

Figure 5.2: Fast CRC update architecture


In unfolding architecture increases the no. of iteration bound. The F matrix based architecture
more simple and low complex. Algorithm and its implementation is given by figure 5.2.

5.3 F matrix based parallel CRC generation


Algorithm for F matrix based architecture :
Algorithm and Parallel architecture for CRC generation based on F matrix is
discussed in this section. As shown in fig. 5.1 it is basic algorithm for F matrix based parallel
CRC generation

34
Figure 5.3: Algorithms for F matrix based architecture
Parallel data input and each element of F matrix, which is generated from given
generator polynomial is ANDed, result of that will XORing with present state of CRC
checksum. The final result generated after (k+ m) /w cycle.

F Matrix Generation
F matrix is generated from generator polynomial as per

Where, {p0……pm-1} is generator polynomial. For example, the generator polynomial for
CRC4 is {1, 0, 0, 1, 1} and w bits are parallely processed

Here w=m=4, for that FW matrix calculated as follow.

35
As indicated above, having F4 available, a power of F of lower order is immediately obtained.
So, for example:

The same procedure may be applied to derive equations for this parallel version. In this case,
the matrix G is G = P’ and becomes:
x-1= FW ⊕ (X ⊕ D)

5.4 Parallel architecture


Parallel architecture based on F matrix illustrated in fig 3.1. As shown in fig. 3.1,
d is data that is parallel processed (i.e 32bit), X' is next state, X is current state (generated
CRC), F(i)(j) is the ith row and jth column of FW matrix. If X = [xm-1 …..x1 x0]T is utilized to
denote the state of the shift registers, in linear system theory, the state equation for LFSRs can
be expressed in modular 2 arithmetic as follow.
Xi’= (P0 ⊗ Xm-1) ⊕ Xi-1
Where, X(i) represents the ith state of the registers, X(i+1) denotes the (i+1)th
state of the registers, d denotes the one bit shift-in serial input.
F is an m x m matrix and G is a 1 x m matrix.
G = [0 0 ……… 0 1] T
Furthermore, if F and G are substituted by Equations F4 and Xi’ we can rewrite equation F4 in
the matrix form as:

36
Finally, equation as
X’= FW ⊗ X ⊕ d
If w bits are parallel processed, then CRC will be generated after (k +m)/w cycles
This equation can be expanded for crc4 given below.
X3'=X2 ⊕ X1 ⊕ X0 ⊕ d3
X2'=X3 ⊕ X2 ⊕ d2
X1'=X3 ⊕ X2 ⊕ X1 ⊕ d1
X0'=X3 ⊕ X2 ⊕ X1 ⊕ X0 ⊕d0
Fig.3.4 demonstrates an example of parallel CRC calculation with multiple input bits w = m = 4. The
dividend is divided into three 4-bit fields, acting as the parallel input vectors D(0),D(1),D(2),
respectively. The initial state is X(0) = [0 0 0 0]T .

From Equation X’= FW ⊗ X ⊕ d, we have,


X(4)= F4 ⊕ X(0) ⊕ D(0)
X(8)= F4 ⊕ X(4) ⊕ D(1)
X(12)= F4 ⊕ X(8) ⊕ D(2)

37
Figure 5.4: Parallel calculation of CRC-32 for 32bit
Property of the FW matrix and the previously mentioned fact that Equation X’= FW ⊗ X ⊕ d
can be regarded as a recursive calculation of the next state X’ by matrix FW, current state X
and parallel input D, make the 32-bit parallel input vector suitable for any length of messages
besides the multiple of 32 bits. Remember that the length of the message is byte based. If the
length of message is not the multiple of 32, after a sequence of 32-bit parallel calculation, the
final remaining number of bits of the message could be 8; 16, or 24. For all these situations,
an additional parallel calculation w = 8; 16; 24 is needed by choosing the corresponding FW.
Since FW can be easily derived from F32, the calculation can be performed using Equation
X’= FW ⊗ X ⊕ d within the same circuit as 32- bit parallel calculation, the only difference is
the FW matrix.
If the length of the message is not the multiple of the number of parallel processing
bits w = 4 i.e. data bit is 11011101011. Then last two more bits (D (3)) need to be calculated
after getting X(12). Therefore, F2 must be obtained from matrix F4, and the extra two bits are
stored at the lower significant bits of the input vector D. Equation X’= FW ⊗ X ⊕ d can then be
applied to calculate the final state X(14), which is the CRC code. Therefore, only an extra
cycle is needed for calculating the extra bits if the data message length is not the multiple of
w, the number of parallel processing bits. It is worth to notice that in CRC-32 algorithm, the
initial state of the shift registers is preset to all ‘1’s.
Therefore, X(0) = 0xFFFF. However, the initial state X(0) does not affect the
correctness of the design. In order for better understanding, the initial state X(0) is still set to
0x0000 when the circuit is implemented.

38
In proposed architecture w= 64 bits are parallel processed and order of generator
polynomial is m= 32. As discussed in section 3, if 32 bits are processed parallel then CRC-32
will be generated after (k +m)/w cycles. If we increase number of bits to be processed
parallel, number of cycles required to calculate CRC can be reduced.
Proposed architecture can be realized by below equation.
Xtemp = FW ⊗ D(0to31)⊕ D(32to63)
X' = FW ⊗X⊕Xtemp (11)
Where,
D (0 to 31) =first 32 bits of parallel data input
D (0 to 63) = next 32 bits of parallel data input
X’=next state
X=present state

Figure 5.5: Block diagram of 64-bit parallel calculation of CRC-32.


In proposed architecture di is the parallel input and F(i)(j) is the element of F32
matrix located at ith row and jth column. As shown in figure 3 input data bits d0….d31 anded
with each row of FW matrix and result will be xored individually with d32, d33 …….d63.
Then each xored result is then xored with the X' (i) term of CRC32. Finally X will be the
CRC generated after (k +m)/w cycle, where w=64.

FIELD PROGRMMABLE GATE ARRAY

39
6.1 Introduction
A field-programmable gate array (FPGA) is an integrated circuit created to be
configured by the customer after manufacturing—hence "field-programmable". The FPGA
configuration is generally defined using a hardware description language (HDL), similar to
that used for an application-specific integrated circuit (ASIC) (circuit diagrams were
previously used to specify the configuration, as they were for ASICs, but this is increasingly
rare). FPGAs can be used to implement any logical function that an ASIC can perform. The
ability to update the functionality after shipping, partial re-configuration of the portion of the
design and the low non-recurring engineering costs relative to an ASIC design, offer
advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a
hierarchy of reconfigurable interconnects that allow the blocks to be "connected
together"—somewhat like a one-chip programmable breadboard. Logic blocks can be
configured to perform complex combinational functions, or merely simple logic like AND
and NAND. In most FPGAs, the logic blocks also include memory elements, which may be
simple flip-flops or more complete blocks of memory.
.
6.2 Architecture
The most common FPGA architecture consists of an array of logic blocks (called
Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on vendor), I/O
pads, and routing channels. Generally, all the routing channels have the same width (number
of wires). Multiple I/O pads may fit into the height of one row or the width of one column in
the array.

Figure 6.1: Architecture of FPGA

40
In general, a logic block (CLB or LAB) consists of a few logical cells. A typical cell
consists of a 4-input Lookup table (LUT), a Full adder (FA) and a D-type flip-flop, as shown.
The LUT are in this figure split into two 3-input LUTs. In normal mode those are
combined into a 4-input LUT through the left mux. In arithmetic mode, their outputs are fed
to the FA. In practice, entire or parts of the FA are put as functions into the LUTs in order to
save space.

6.3 FPGA Design and Programming


To specify the behavior of the FPGA, the user provides a hardware description
language (HDL) or a schematic design. The HDL form is more suited to work with large
structures because it's possible to just specify them numerically rather than having to draw
every piece. However, schematic entry can allow for easier imagination of a design. Then,
using an electronic design automation tool, a technology-mapped net list is generated. The
net list can then be fitted to the actual FPGA architecture using a process called place &
route, usually performed by the FPGA company's proprietary place-and-route software.
The user will validate the map, place and route results via timing analysis,
simulation, and other verification methods. Once the design and validation process is
complete, the binary file generated is used to reconfigure the FPGA. The most common
HDLs are VHDL and Verilog. Though these two languages are similar but we prefer VHDL
for programming because of its widely in use.

6.4 Applications
Applications of FPGAs include digital signal processing, software-defined radio,
aerospace and defense systems, ASIC prototyping, medical imaging, cryptography,
bioinformatics computer hardware emulation, radio astronomy, metal detection etc. The
inherent parallelism of the logic resources on an FPGA allows for considerable computational
throughput even at a low MHz clock rates.
The flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel arithmetic units.
This has driven a new type of processing called reconfigurable computing, where time
intensive tasks are offloaded from software to FPGAs.

41
FPGAs have been reserved for specific vertical applications where the volume of
production is small. For these low-volume applications, the premium that companies pay in
hardware costs per unit for a programmable chip is more affordable than the development
resources spent on creating an ASIC for a low-volume application. Today, new cost and
performance dynamics have broadened the range of viable applications

42
PROPOSED METHOD

In proposed method, a unique way of implementing multiple bit error detection and
single bit error correction using CRC for a frame width of 24 bits and 32 bits. Let Ftr be the
frame transmitted in which the checksum is happened after 16 or 8 bits of data. We
can express Ftr as shown in Equation(1)

Ftr=Dtr&Ctr---------------------(1)

Where,& - concatenation operator.

Dtr- transmitted 16 or 8 bit data.

Ctr - transmitted 16 bit checksum.

At the receiver side, let Fre be the received frame as shown in Equation 2.

Fre = Dre & Cre Where, ---------------------------------- (2)

Cre indicates received checksum.

Dre represents received data.

Receiver again calculates CRC on the received data. Let Ccal indicates the CRC
calculated over Dre at the receiver side. If no error has occurred during transmission then Cre
and Ccal are equal. But if some bit(s) are in error, then Cre and Ccal will be in
mismatch. In such cases the error needs to be detected and corrected .Hence we
need to calculate the syndrome which is given by:

Syn = Cre XOR Ccal----------------------------------------- (4)

43
The syndrome is by using equation 4, the syndrome can be calculated in
minimum number of clock cycles .This method of syndrome calculation was proposed in
and is an efficient one, hence we have adopted it

In this method uses two standards CRC-16 and CRC-8 for correcting single bit
errors. Here we have designed the algorithm to detect more than one error and we will raise a
flag indicating retransmission of the frame if there were more than one error.

There are 2 cases in which the single bit error can occur. In the first case, one bit error
can occur in the data bits. In the second case, one bit error can be in checksum bits. Hence
the total number of possible single bit errors in data for both the standards is shown in
Table8.1

Table 7.1 Possible number of error patterns

For CRC-16 if ‘i’ is the position of the error then for case 1)1≤ i ≤16 and

for case 2) 17≤ i ≤32 . Similarly for CRC-8 only the limits will change.

In proposed method we have used a new technique for detecting errors in both
the cases .This method is an outcome of the fact that if an error occurs as in case 2 then the
syndrome pattern will have1’s equal to no. of bits in error. As we know, we only have to
correct the error in the data bits, there is no need to correct the error which has occurred in the
checksum bits. But we need to detect the errors in the checksum bits. This method of analysis
is advantages since it reduces memory requirements by 76% and also we need to worry only
about the errors in data, this reduces the load on the computational block of the receiver.

To explain the method of correcting single bit error that we have proposed let us
consider an example of (7,4) CRC code. The syndrome generator circuit for this codeword.
This circuit of shift registers is similar to the circuit shown in Figure 1 except that it has only
three shift registers corresponding to three check bits in the codeword.

44
Figure 7.1: syndrome circuit for (7,4)

Here the generator polynomial used is g(x)=1+x+x3. The received vector Z=1110101
with three check bits and a nibble of data. In this codeword first three MSB bits are
check bits which is concatenated with a nibble of data. Here the 3rd MSB or 3rd Check bit
is in error.So we need to detect it and correct it. In conventional method the output of the
circuit in Figure 2 which is syndrome is used to address a look up table which actually stores
the error pattern. This error pattern is then XORed with the CRC frame to get the correct
data.

But, in the proposed method the same circuit in Figure 2 is used but more
number of clock cycles are required. That is when all the 7 received bits are entered
into the syndrome calculator, ‘0’s are now fed into it, from 8th shift onwards as shown Table
4. Each time a ‘0’ is fed into the circuit, the shift register contents are tabulated. This
process of feeding ‘0’s continues till the shift register contents read S0 S1 S2 =100. In
general for (n-k) shift register, the contents should read S0, S1……. Sn-k-1 = 1 0
0………….0. i.e., 1 followed by (n-k-1) number of 0s. In Table 7.2, we find that at the 12th
shift we get shift register contents as 100. The error is then located and corrected as given

45
.

Table 7.2: Contents of shift register in the syndrome calculator

As we can see from Table 7.2 after 7th shift we get the syndrome from the
circuit in “fig 2”. Since this syndrome is not equal to “000” it indicates an error. Then the
procedure is continued as explained earlier until 12th shift when shift register content is
“100”. This shift number indicates the position of the error as shown below.

The 5th bit counting from right is in error. Therefore Error pattern is E=0010000

Corrected vector V= Z XOR E

= 1110101 XOR 0010000

46
V = 1100101

This is the same method employed in correcting single bit error in CRC-8 and CRC-16.

7.2 Implementation on FPGA:

47
Figure 7.2: VLSI Architecture

A parallel VLSI architecture of the decoder circuitry is as shown in Fig7.2.


The syndrome calculator generates the vector syndrome from the received frame based on
the circuit similar to fig 7.1. This operation requires 32 clock cycles and hence this timing is
kept track by a Timing and Control Unit with the help of a 5 bit counter. Then if the
syndrome is all zero, it means that the corrected data is same as received data or else if the
number of 1’s in the syndrome pattern is equal to1, it indicates an error in the check bits.
Finally, if single bit error is present in data bits which is indicated by the above
equations not being satisfied then 0’s are input into the syndrome calculator circuit and
after each 0 input the vector syndrome is XORed with the pattern “0000000000000001”and
checked whether the result is zero or not. If it is zero then the corrected data is obtained by
XORing the received data with the error pattern register content. If not then the
procedure is continued and error pattern register content is shifted towards left by 1
bit. The initial content of error pattern register is “0000000000000001”.

48
7.3 State diagram of control unit:

Figure7.3:State diagram of the control unit

The Timing and Control Unit is designed using a state diagram as shown in “fig
4”. It consists of four states, initially in So we need to compute the syndrome. Hence we
need a delay of 32 clock cycles and received bits must be routed to syndrome calculator
circuit. This is achieved by setting the mode pin to high and Sel1 pin to low which in-turn
connects clk input to the 5 bit counter and the CRC frame to the syndrome circuit
respectively. So when the Top signal of the counter becomes 1, it indicates the completion
of the delay required. Then the state transition to S1 occurs where mode pin turned
low so that the counter can count the number of1’s in the syndrome vector. The counter
content C is sent back to the control unit. If true NbE signal goes high. State jumps to S2 if
C=1 or NbE=1and the select pin Sel2 is made low which makes the mux connect received
data to corrected data. If C≠1 and NbE=0 then the control jumps from S1 to S3 which
indicates that a single bit error is present in data bits. In S3 the control unit sets the Sel1 to
high which inputs 0’s into the syndrome calculator. After each clock cycle syndrome is
XORed with “10000…..0” and checked if zero using the same Generic NOR gate. If
true SbE goes high indicating the completion of the process, if false then error pattern
register is shifted towards left as explained in previous section.

49
The control unit stays in S3 until SbE is 0 . Once SbE is 1 then Sel2 is set as 0 to
connect the output corrected data to a pattern obtained by XORing received data with content
of error pattern register.

INTRODUCTION TO XILINX

8.1 Introduction
Xilinx is a powerful software tool that is used to design, synthesize, simulate, test and
verify digital circuit designs. The designer (you in this case) can describe the digital design
by either using the schematic entry tool or a hardware description language. In this tutorial,
we will create VHDL design input files – the hardware description of the logic circuit,
compile VHDL source files, create a test bench and simulate the design to make sure of the
correct operation of the design (functional simulation). The purpose of this tutorial is to give
new users an exposure to the basic and necessary steps to implement and examine your own
designs using ISE environment. In this tutorial, we will design one simple module (OR gate);
however, in the future, you will be designing such modules and completing the overall circuit
design from these existing files.

A VHDL input file in the Xilinx environment consists of:


● Entity Declarations: module name and interface specifications (I/O) – list of input
and output ports; their mode, which is direction of data flow; and data type.
● Architecture: defines a component’s logic operation.

As you will learn (or have learned) in this course, there are different styles for the architecture
body:
● Behavioral – set of sequential assignment statements
● Data Flow – set of concurrent assignments
● Structural – set of interconnected components

A combination of these could be used, but in this tutorial we will use Dataflow. In its simplest
form, the architectural body will take the following format, regardless of the style:
architecture

50
architecture_name of entity_name is
begin
… -- statement
end architecture_name;

8.2 Basic Software Requirements


• After creating an account, install Xilinx software: ISE 14.7 from the website at
http://www.xilinx.com/support/download/index.htm
• For a step by step process of downloading and installing Xilinx ISE WebPack (student
version), go to the appendix at the end of tutorial.
• For extra help with the installation, go to:
http://www.xilinx.com/support/documentation/dt_ise.htm

8.3 ISE Project Navigator


In this section, we introduce thereader to the main components of an “ISE Project
Navigator” window, which allows us to manage our design files and move our design process
from creation, to synthesis, to simulation phase.

51
Figure 8.1: Xilinx Project Navigator window

1. Toolbar: provides fast access to frequently used menu commands.


2. Design Panel: consists of the following three areas
A. View Pane – allows for only viewing source files that are associated with the selected
design phase (e.g. testbench source files can only be viewed in “Implementation”
view).
B. Hierarchy Pane – allows for viewing source files that you created and added to your
project. You can double click a source file to open for editing in the workspace. You
can expand the level of hierarchy by clicking (+) icon or collapse by clicking (-) icon.
C. Process Pane – determines and shows only the processes available to run for the selected
source file. Similarly, they can be expanded and collapsed using (+) and (-) icons,
respectively.
3. Transcript Window – displays output messages from processes you run.

52
4. Workspace – used to view and edit source files, multiple files can be opened
simultaneously and the name of each file will be shown in a separate tab in the bottom of
workspace window to enable you to switch between different files

8.4 Creating New Project


In this Project, we will be designing, synthesizing and simulating a 2-input “or gate”,
where “a” and “b” are our inputs and “c” is our output. The truth table (as we all know) is
given below (used to verify our design).
In order to start ISE, double click the desktop icon:
Or click:
Start All Programs Xilinx Design Suite 14.4 ISE Design Tools 64-bit Project Navigator
You will be presented with “Tip of The Day”, just click “OK”
● Create a new project by selecting:
1. File New Project, the following window will appear

53
new

Figure 8.2: Creating new project using ”NEW PROJECT WIZARD”

2. In the “Name” field enter a short name for your project that correctly describes what you
are designing (For now we will use “ORgate”). Also, make sure that your project name:
● Starts with a letter
● Contains only alphanumeric characters and underscores
● Cannot contain two consecutive underscores.
3. Click the Browse icon (pointed by the arrow in the Figure above) in order to select the
desired location to which you would like to save your project.

54
4. In the “Top-level source type” field, make sure that HDL is selected – this is selected if
the top level design to be used is in VHDL or Verilog, which can include lower level modules
such as HDL files, schematics or different types.
5. Click “Nex
6. In the “Project Settings” page shown below, ensure that the following options are set
because they effect the types and processes that will be available for your design:
● “Product Category” All
● “Family” Spartan3E
● “Device” XC3S500E
● “Package” FT256
● “Speed” -5
● “Top-Level Source Type” HDL (automatically selected)
● “Synthesis Tool” XST (VHDL/Verilog), which is a technology to synthesize VHDL,
Verilog, or mixed language designs to create “Xilinx-specific netlist” files.
● “Simulator” ISim (VHDL/Verilog), allows for running integrated simulation process
as part of your ISE design flow.
● “Preferred Language” VHDL
7. Leave the remaining fields as their default settings.
8. Click “Next” and you will be presented with a summary of your new project as shown in
Figure
In order to open an existing project in Xilinx, select File Open Project to show the
lists of projects available in a certain directory, choose the project you want and check “OK”
9. Click “Finish” and you will exit the “New Project Wizard” and be taken back to the
original “ISE Project Navigator” window, but a new project hierarchy is generated with the
“ORgate” design file displayed in the “Hierarchy Pane” as shown in Figure;

Creating VHDL Source Files


The “Create New Source Wizard” will enable you to create a VHDL source input
file (.vhd) for a combinational logic design that will contain information about the design of
the 2-input “or gate”…. (Any other text editor can be used to do so)

55
● Click on the “New Source” icon, which is to the left of the “Hierarchy Pane.”
This can also be done by right clicking on “ORgate” source file in the “Hierarchy
Pane” and clicking “New Source,” as shown in Figure 7. This will take us to the
“New Source Wizard” as shown Figure 8.

Figure8.3: Creating a new source

56
Figure 8.4: New Source Wizard

● Select “VHDL Module” as a source file type to be added to the project since our
files will contain VHDL design code, so our files will have “.vhd” extension.
● In the “File name” field, enter a name of the entity for which you are creating
input and output ports for. Remember to follow the conventions mentioned earlier
(in Section 4, step 2) for naming the project. In this case, enter “ORgate”.
● For the “Location” field, click the browse icon to navigate to the appropriate
folder, which should be the same one used for creating the project.
● Make sure that the “Add to project” checkbox is selected to automatically add
this source to your project so that you don’t need to add it to the project again
manually.
● Click “Next”, the wizard will take you to the “Define Module” page as shown
below, where I/O of the module (OR gate) will get defined. As you can see, the
entity name is there, but can be changed if you want and the architecture name is
“Behavioral” by default.
● “Direction” field is used to describe the mode, which is how data is transferred
through the port. We are concerned with 3 modes: in – data flowing into the port;
out – data flowing out of the port; inout – data flowing into and out of the port
(bi-directional). Since we have 2 inputs and 1 output, in the first 3 fields under
port name, we type “a”, “b” and “c” and set the “Direction” fields as “in” for the
first two fields and “out” for the third field (c).
● Click “Next” to view and verify the summary of the information about the new
source created. If any changes are to be made, just click cancel.
● After making sure that the description of the module is correct, click “Finish.”
The source file will be now displayed in “ISE Project Navigator” as shown
below; the workspace window will be used as a text editor to make necessary
changes to the source file. All the input and output ports that we specified will be
displayed.

57
8.5 Synthesizing VHDL Code
The design has to be synthesized before it can be checked for correctness by running
functional simulation. XST will analyze the VHDL code and try to gather building blocks in
order to create efficient implementation by performing resource sharing to reducing area
while increasing clock frequency. In other words, synthesis will convert the code into digital
circuit by transforming it into a net list of gates.
● Make sure that “Implementation” checkbox is checked from the “View Pane” in the
“Design” Panel.
● From the “Process Pane” in the “Design” Panel, double click on the “Synthesize –
XST” function as shown in Figure 13, which will check the syntax of your code and
give you warning and error messages if any are present in the “Transcript Window”,
where you can click “Errors” or “Warnings” tab. Errors are indicated by next to the
message and warnings are indicated as shown. You can right click the message and
select “Search for Answer Record” to open the Xilinx website and show any related
answer. Otherwise, you can just right click the message and select “Go to Source” to
go directly to the error. These errors must be corrected, saved, and fresh synthesis
(compilation) needs to be done again before you move to the next step; otherwise,
you won’t be able to simulate your design. After correcting the errors (if any), the
synthesis process runs without errors and displays to the left of “Synthesis – XST”
● After a successful Synthesis, you will get a message as shown in Figure

8.6 Simulation of Design


In order to do functional and timing simulation, we will create a test bench for our VHDL
code which will help in debugging our design. This allows us to verify that our design
functions as expected (given inputs in our truth table, we get desired outputs). In order to test
the gate completely, we shall provide all the different input combinations.
● From the tool bar, select Project New Source
● From the “Select Source Type” options select “VHDL Test Bench”

58
● In the “File name” field choose a name that signifies the test bench and adheres to
the naming conventions mentioned earlier. Type “testorgate”
● For the “Location” field, click the browse icon to navigate to the appropriate
folder, which should be the same one used for creating the project.
● Click “Next”
● The following window allows you to select which design you want to create a test
bench for, in our case “ORgate” since it is the only module we have; however, for
your future designs, you can make test benches for individual components of your
designs as well as the top-level design which ties it all together
● Click “Next”
● A summary window like the one shown below will appear, click “Finish”
● Now you will view the test bench file (testorgate.vhd), shown below, that Xilinx
has generated in the workspace window.
● Let’s modify the default code by removing the highlighted code shown below,
which is the clock process that is generated by default, which divides the clock
period by two. We also want to remove the stimulus process.
● Replace the deleted code with the following code segment, which will perform a
very simple initial test of the design for simulation by giving different values of
inputs:
● The test bench file does not appear in the “Hierarchy” Pane of the “Design”
Panel. This is because there is a separate view for implementation and test files. In
order to view test files, select the box of “Simulation” in the “View Pane” of the
“Design” panel. In the “Process Pane,” double click on the “Behavioral Check
Syntax” to make sure that you didn’t make any syntax errors while making
changes.
● Save your work.
● Double click on “Simulate Behavioral Model” in the “Process Pane”, which
will open the ISim software with your test bench loaded.
● ISim simulator window will open with your simulation executed, as shown in
Figure 22, where you are able to simulate your designs and check for errors. You
can step through your VHDL designs and check the states of signals and set the
simulation to run for specific period of time. Make sure to check the results of the

59
simulation output against your truth table results to verify the correctness of the
design. The resolution of the simulation is set to 1 picosecond to ensure correct
processing of your design.
● To get a better view of the simulation waveforms, from the tool bar, click on View
Zoom Full View or use F6 or click on the shortcut “Zoom to Full View”
iconThis will give you a better view of what your simulation is doing

● In the text box located near the run button, you may specify amount of time for
the simulation to run; the button to the left of the box will execute the simulation
for the time you have specified. After setting the new simulation time, click on
Re-Start to clear the previous simulation result and then click on Run to
start simulating with new time setting. Below is an example of 2us of simulation
time

Figure 8.5: Change of simulation Run time

60
● You can change the default simulation run time. This will help to avoid setting the run
time again every time you launch the simulation. This can be done by setting the
properties of your project in Xilinx.
● Right click on Simulate Behavioral Model, and then click on Process Properties…
An ISim property window will appear. You can modify the simulation run time from
value textbox as in Figure
● In some cases, you want to change the display format of a specific signal from binary
format into other format. This can be done by doing a right click on that signal, then
click on Radix and choose your desired display format. Below is an example of
changing display format from binary number to hexadecimal number:
● When your design is big, it is not easy to just look at into your HDL code to find the
mistake. In this case, you may want to see the internal signals of a specific component
to see if it is working properly or not. To do this, you will need to open both panels
Instances and Processes and Objects

61
CONCLUSION
CRC is the method that can detect errors in transferring data between two points.
In this method CRC has been used for single bit error correction. We have implemented
CRC-16 Hardware implementation on FPGA can be effectively used to improve the
performance of CRC calculations.

Components like shift register, XOR and NOR gates are used, so the entire circuit
can be easily designed. This approach is efficient in terms of hardware and speed.

62
REFERENCES
[1] Campobello, G.; Patane, G.; Russo, M.; "Parallel CRC realization," Computers, IEEE
Transactions on , vol.52, no.10, pp. 1312- 1319, Oct.2003
[2] Albertengo, G.; Sisto, R.; , "Parallel CRC generation," Micro, IEEE, vol.10, no.5,
pp.63-71,Oct1990
[3] M.D.Shieh et al., “A Systematic Approach for Parallel CRC Computations,” Journal of
Information Science and Engineering,
May 2001.
[4] Braun, F.; Waldvogel, M.; , "Fast incremental CRC updates for IP over ATM networks,"
High Performance Switching and Routing, 2001 IEEE Workshop on , vol., no., pp.48-52,
2001
[5] Weidong Lu and Stephan Wong, “A Fast CRC Update Implementation”, IEEE Workshop
on High Performance
Switching and Routing ,pp. 113-120, Oct. 2003.
[6] S.R. Ruckmani, P.Anbalagan, “High Speed cyclic Redundancy Check for USB” Research
Scholar, Department of Electrical
Engineering, Coimbatore Institute of Technology, Coimbatore- 641014, DSP Journal, Volume
6, Issue 1, September, 2006.
[7] Yan Sun; Min Sik Kim; , "A Pipelined CRC Calculation Using Lookup Tables,"
Consumer Communications and Networking
Conference (CCNC), 2010 7th IEEE , vol., no., pp.1-2, 9-12 Jan. 2010

63
[8] Sprachmann, M.; , "Automatic generation of parallel CRC circuits," Design & Test of
Computers, IEEE , vol.18, no.3,pp.108-114, May 2001 583 588 585

64

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy