Dup2 - For Merge
Dup2 - For Merge
1
This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable
of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor count
is largely due to the 24MB L3 cache). Current designs, as opposed to the earliest devices, use
extensive design automation and automated logic synthesis to lay out the transistors, enabling
higher levels of complexity in the resulting logic functionality. Certain high-performance
logic blocks like the SRAM cell, however, are still designed by hand to ensure the highest
efficiency (sometimes by bending or breaking established design rules to obtain the last bit of
performance by trading stability).
2
1.3 Applications
● Electronic system in cars.
● Digital electronics control VCRs
● Transaction processing system, ATM
● Personal computers and Workstations
● Medical electronic systems.
3
LITERATURE SURVEY
2.1 Introduction
There are several techniques for generating check bits that can be added to a
message. Perhaps the simplest is to append a single bit, called the “parity bit,” which makes
the total number of 1-bits in the code vector (message with parity bit appended) even (or
odd). If a single bit gets altered in transmission, this will change the parity from even to odd
(or the reverse). The sender generates the parity bit by simply summing the message bits
modulo 2 that is, by XORing them together. It then appends the parity bit (or its complement)
to the message. The receiver can check the message by summing all message bits modulo 2
and checking that the sum agrees with the parity bit. Equivalently, the receiver can sum all
the bits (message and parity) and check that the result is 0 (if even parity is being used).
This simple parity technique is often said to detect 1-bit errors. Actually it detects
errors in any odd number of bits (including the parity bit), but it is a small comfort to know
you are detecting 3-bit errors if you are missing 2-bit errors. For bit serial sending and
receiving, the hardware to generate and check a single parity bit is very simple. It consists of
a single XOR gate together with some control circuitry. For bit parallel transmission, an XOR
tree may be used, as illustrated in Figure 2.1.
Figure 2.1 shows XOR tree
4
Figure 2.1: XOR tree
Other techniques for computing a checksum are to form the XOR of all the bytes in the
message, or to compute a sum with end-around carry of all the bytes. In the latter method the
carry from each 8-bit sum is added into the least significant bit of the accumulator. It is
believed that this is more likely to detect errors than the simple XOR, or the sum of the bytes
with carry discarded. A technique that is believed to be quite good in terms of error detection,
and which is easy to implement in hardware, is the cyclic redundancy check. This is another
way to compute a checksum, usually eight, 16, or 32 bits in length,that is appended to the
message. We will briefly review the theory and then give some algorithms for computing in
software a commonly used 32-bit CRC checksum.
2.2 Theory
The CRC is based on polynomial arithmetic, in particular, on computing the
remainder of dividing one polynomial in GF(2) (Galois field with two elements) by another.
It is a little like treating the message as a very large binary number, and computing the
remainder on dividing it by a fairly large prime such as intuitively, one would expect this to
give a reliable checksum.
A polynomial in GF(2) is a polynomial in a single variable x whose coefficients
are 0 or 1. Addition and subtraction are done modulo 2that is, they are both the same as the
XOR operator. For example, the sum of the polynomials x3 + x + 1 and x4 + x3+ x2 + x is
x4+x2+1 as is their difference. These polynomials are not usually written with minus signs,
but they could be, because a coefficient of –1 is equivalent to a coefficient of 1.
Multiplication of such polynomials is straightforward. The product of one coefficient by
another is the same as their combination by the logical AND operator, and the partial
products are summed using XOR. Multiplication is not needed to compute the CRC
checksum. Division of polynomials over GF(2) can be done in much the same way as long
division of polynomials over the integers.
Below is an example.
5
We can verify that the quotient of x4+x3+1 multiplied by the divisor of x3+x+1 plus the
remainder of x2+1 equals the dividend. The CRC method treats the message as a polynomial
in GF(2). For example, the message 11001001, where the order of transmission is from left to
right (110…) is treated as a representation of the polynomial x7+x6+x3+1. The sender and
receiver agree on a certain fixed polynomial called the generator polynomial. For example,
for a 16-bit CRC the CCITT has chosen the polynomial x16+x12+x5+1 which is now widely
used for a 16-bit CRC checksum. To compute an r-bit CRC checksum, the generator
polynomial must be of degree r. The sender appends r 0-bits to the m-bit message and divides
the resulting polynomial of degree m + r – 1 by the generator polynomial. This produces a
remainder polynomial of degree r-1 (or less). The remainder polynomial has r coefficients,
which are the checksum. The quotient polynomial is discarded. The data transmitted (the
code vector) is the original m-bit message followed by the r-bit checksum.
There are two ways for the receiver to assess the correctness of the transmission. It can
compute the checksum from the first m bits of the received data, and verify that it agrees with
the last r received bits. Alternatively, and following usual practice, the receiver can divide all
the m+r received bits by the generator polynomial and check that the r-bit remainder is 0. To
see that the remainder must be 0, let M be the polynomial representation of the message, and
let R be the polynomial representation of the remainder that was computed by the sender.
Then the transmitted data corresponds to the polynomial Mxr-R (or, equivalently, Mxr+ R).
By the way R was computed, we know that Mxr =QG+ R where G is the generator
polynomial and Q is the quotient (that was discarded). Therefore the transmitted data, Mxr-R,
is equal to QG, which is clearly a multiple of G. If the receiver is built as nearly as possible
just like the sender, the receiver will append r 0-bits to the received data as it computes the
remainder R. But the received data with 0-bits appended is still a multiple of G, so the
computed remainder is still 0.
6
That’s the basic idea, but in reality the process is altered slightly to correct for such
deficiencies as the fact that the method as described is insensitive to the number of leading
and trailing 0-bits in the data transmitted. In particular, if a failure occurred that caused the
received data, including the checksum, to be all-0, it would be accepted. Choosing a “good”
generator polynomial is something of an art. Two simple observations: For an r-bit
checksum, G should be of degree r, because otherwise the first bit of the checksum would
always be 0, which wastes a bit of the checksum. Similarly, the last coefficient should be 1
(that is, G should not be divisible by x), because otherwise the last bit of the checksum would
always be 0 (because Mxr =QG+ R, if G is divisible by x, then R must be also).
The following facts about generator polynomials are proved in [PeBr] and/or [Tanen]:
● If G contains two or more terms, all single-bit errors are detected.
● If G is not divisible by x (that is, if the last term is 1), and e is the least positive
integer such that G evenly divides xe+1, then all double errors that are within a
frame of e bits are detected. A particularly good polynomial in this respect is
x15+x14+1 for which e=32767.
● If x+1 is a factor of G, all errors consisting of an odd number of bits are detected.
● An r-bit CRC checksum detects all burst errors of length ≤ r. (A burst error of
length r is a string of r bits in which the first and last are in error, and the
intermediate r-2 bits may or may not be in error.) The generator polynomial x+1
creates a checksum of length 1, which applies even parity to the message.
● It is interesting to note that if a code of any type can detect all double-bit and
single-bit errors, then it can in principle correct single-bit errors. To see this,
suppose data containing a single-bit error is received. Imagine complementing all
the bits, one at a time. In all cases but one, this results in a double-bit error, which
is detected. But when the erroneous bit is complemented, the data is error-free,
which is recognized. In spite of this, the CRC method does not seem to be used for
single-bit error correction. Instead, the sender is requested to repeat the whole
transmission if any error is detected.
2.3 Hardware
7
To develop a hardware circuit for computing the CRC checksum, we reduce the
polynomial division process to its essentials.
The process employs a shift register, which we denote by CRC. This is of length r (the
degree of G) bits, not r+1 as we might expect. When the subtractions (exclusive or’s) are
done, it is not necessary to represent the high-order bit, because the high-order bits of G and
the quantity it is being subtracted from are both 1.
The division process might be described informally as follows:
1. Initialize the CRC register to all 0-bits.
2. Get first/next message bit m.
3. If the high-order bit of CRC is 1, Shift CRC and m together left 1 position, and XOR the
result with the low-order r bits of G
Otherwise, Just shift CRC and m left 1 position. If there are more message bits, go back to get
the next one.
It might seem that the subtraction should be done first, and then the shift. It would be
done that way if the CRC register held the entire generator polynomial, which in bit form is
r+1 bits. Instead, the CRC register holds only the low-order r bits of G, so the shift is done
first, to align things properly.
Below is shown the contents of the CRC register for the generator G =x3+x+1 and the
message M =x7+x6+x5+x2+x. Expressed in binary, G =
1011 and M = 11100110.
1. 000 Initial CRC contents. High-order bit is 0, so just shift in first message bit.
2. 001 High-order bit is 0, so just shift in second message bit, giving:
3. 011 High-order bit is 0 again, so just shift in third message bit, giving:
4. 111 High-order bit is 1, so shift and then XOR with 011, giving:
5. 101 High-order bit is 1, so shift and then XOR with 011, giving:
6. 001 High-order bit is 0, so just shift in fifth message bit, giving:
7. 011 High-order bit is 0, so just shift in sixth message bit, giving:
8. 111 High-order bit is 1, so shift and then XOR with 011, giving:
9. 101 There are no more message bits, so this is the remainder.
These steps can be implemented with the (simplified) circuit shown in Figure 2.2, which is
known as a feedback shift register.
8
Figure 2.2: Polynomial division circuit for G =x3+x+1
The three boxes in the figure represent the three bits of the CRC register. When a
message bit comes in, if the high-order bit (x2 box) is 0, simultaneously the message bit is
shifted into the x0 box, the bit in x0 is shifted to x1, the bit in x1 is shifted to x2, and the bit in x2
is discarded. If the high-order bit of the CRC register is 1, then a 1 is present at the lower
input of each of the two XOR gates. When a message bit comes in, the same shifting takes
place but the three bits that wind up in the CRC register have been XORed with binary 011.
When all the message bits have been processed, the CRC holds M mod G. If the circuit of
Figure 2.2 were used for the CRC calculation, then after processing the message, r (in this
case 3) 0-bits would have to be fed in. Then the CRC register would have the desired
checksum, Mxr mod G. But, there is a way to avoid this step with a simple rearrangement of
the circuit.
9
ERROR CORRECTING CODES
3.1 Introduction
Environmental interference and physical defects in the communication medium can
cause random bit errors during data transmission. Error coding is a method of detecting
and correcting these errors to ensure information is transferred intact from its source
to its destination. Error coding is used for fault tolerant computing in computer
memory, magnetic and optical data storage media, satellite and deep space
communications, network communications, cellular telephone networks, and almost any
other form of digital data communication. Error coding uses mathematical formulas to
encode data bits at the source into longer bit words for transmission. The "code word" can
then be decoded at the destination to retrieve the information. The extra bits in the code
word provide redundancy that, according to the coding scheme used, will allow the
destination to use the decoding process to determine if the communication medium
introduced errors and in some cases correct them so that the data need not be retransmitted.
Different error coding schemes are chosen depending on the types of errors
expected, the communication medium's expected error rate, and whether or not data
retransmission is possible. Faster processors and better communications technology make
more complex coding schemes, with better error detecting and correcting capabilities,
possible for smaller embedded systems, allowing for more robust communications.
However, tradeoffs between bandwidth and coding overhead, coding complexity and
allowable coding delay between transmissions, must be considered for each application.
Even if we know what type of errors can occur, we can’t simple recognize them. We
can do this simply by comparing this copy received with another copy of
intended transmission. In this mechanism the source data block is send twice. The
receiver compares them with the help of a comparator and if those two blocks differ, a
request for re-transmission is made. To achieve forward error correction, three sets of the
same data block are sent and majority decision selects the correct block. These methods
are very inefficient and increase the traffic two or three times. Fortunately there are more
efficient error detection and correction codes.
10
There are two basic strategies for dealing with errors. One way is to include
enough redundant information (extra bits are introduced into the data stream at the
transmitter on a regular and logical basis) along with each block of data sent to enable
the receiver to deduce what the transmitted character must have been. The other way is to
include only enough redundancy to allow the receiver to deduce that error has occurred, but
not which error has occurred and the receiver asks for a retransmission. The former
strategy uses Error-Correcting Codes and latter uses Error-detecting Codes To
understand how errors can be handled, it is necessary to look closely at what error really is.
Normally, a frame consists of m-data bits (i.e., message bits) and r-redundant bits (or check
bits). Let the total number of bits be n (m + r). An n-bit unit containing data and check-bits is
often referred to as an n-bit codeword.
Given any two code-words, say 10010101 and 11010100, it is possible to determine
how many corresponding bits differ, just EXCLUSIVE OR the two code-words, and count
the number of 1’s in the result. The number of bits position in which code words differ is
called the Hamming distance. If two code words are a Hamming distance d-apart, it will
require d single-bit errors to convert one code word to other. The error detecting and
correcting properties depends on its Hamming distance.
To detect d errors, you need a distance (d+1) code because with such a code there is no
way that d-single bit errors can change a valid code word into another valid code word.
Whenever receiver sees an invalid code word, it can tell that a transmission error has
occurred.
Similarly, to correct the errors, you need a distance 2d+1 code because that way
the legal code words are so far apart that even with d changes, the original codeword is still
closer than any other code-word, so it can be uniquely determined.
First, various types of errors have been introduced in Sec. 3.2.2 followed by different
error detecting codes in Sec. 3.2.3. Finally, error correcting codes have been introduced
in Sec. 3.2.4.
11
3.2 Types of errors
These interferences can change the timing and shape of the signal. If the
signal is carrying binary encoded data, such changes can alter the meaning of the
data. These errors can be divided into two types: Single-bit error and Burst error.
Single bit error:
The term single-bit error means that only one bit of given data unit (such as
a byte, character, or data unit) is changed from 1 to 0 or from 0 to 1 as shown in fig
12
Figure: 3.2 Burst errors
Burst errors are mostly likely to happen in serial transmission. The duration of the
noise is normally longer than the duration of a single bit, which means that the noise affects
data; it affects a set of bits as shown in Fig. 3.2.2. The number of bits affected depends on the
data rate and duration of noise
13
At the receiving end the parity bit is computed from the received data bits and
compared with the received parity bit, as shown in Fig. 3.2.3. This scheme makes the total
number of 1’s even, that is why it is called even parity checking. Considering a 4-bit word,
different combinations of the data words and the corresponding code words are given in
Table 3.2.1.
14
Note that for the sake of simplicity, we are discussing here the even-parity
checking, where the number of 1’s should be an even number. It is also possible to use
odd-parity checking, where the number of 1’s should be odd
Performance:
An observation of the table reveals that to move from one code word to another, at
least two data bits should be changed. Hence these set of code words are said to
have a minimum distance (hamming distance) of 2, which means that a receiver that
has knowledge of the code word set can detect all single bit errors in each code
word. However, if two errors occur in the code word, it becomes another valid member of the
set and the decoder will see only another valid code word and know nothing of the error.
Thus errors in more than one bit cannot be detected. In fact it can be shown that a single
parity check code can detect only odd number of errors in a code word.
15
Performance can be improved by using two-dimensional parity check, which
organizes the block of bits in the form of a table. Parity check bits are calculated for
each row, which is equivalent to a simple parity check bit. Parity check bits are also
calculated for all columns then both are sent along with the data. At the receiving end
these are compared with the parity bits calculated on the received data. This is illustrated in
Fig.2.4.
Performance
Two- Dimension Parity Checking increases the likelihood of detecting burst errors. As
we have shown in Fig. 3.2.4 that a 2-D Parity check of n bits can detect a burst error of n bits.
A burst error of more than n bits is also detected by 2-D Parity check with a high- probability.
There is, however, one pattern of error that remains elusive. If two bits in one data unit are
damaged and two bits in exactly same position in another data unit are also damaged, the 2-D
Parity check checker will not detect an error. For example, if two data units: 11001100 and
10101100. If first and second from last bits in each of them is changed, making the data units
as 01001110 and 00101110, the error cannot be detected by 2-D Parity check.
16
In checksum error detection scheme, the data is divided into k segments each of m
bits. In the sender’s end the segments are added using 1’s complement arithmetic to get the
sum. The sum is complemented to get the checksum. The checksum segment is sent along
with the data segments as shown in Fig. 3.2.5 (a). At the receiver’s end, all received segments
are added using 1’s complement arithmetic to get the sum. The sum is complemented. If the
result is zero, the received data is accepted; otherwise discarded, as shown in Fig.
3.2.5 (b).
Performance
The checksum detects all errors involving an odd number of bits. It also detects most
errors involving even number of bits
Figure 3.5 (a) Sender’s end for the calculation of the checksum,
(b) Receiving end for checking the checksum
17
This Cyclic Redundancy Check is the most powerful and easy to implement technique.
Unlike checksum scheme, which is based on addition, CRC is based on binary division. In
CRC, a sequence of redundant bits, called cyclic redundancy check bits, are
appended to the end of data unit so that the resulting data unit becomes exactly divisible by a
second, predetermined binary number. At the destination, the incoming data unit is divided by
the same number. If at this step there is no remainder, the data unit is assumed to be correct
and is therefore accepted. A remainder indicates that the data unit has been damaged in
transit and therefore must be rejected. The generalized technique can be explained as
follows.
If a k bit message is to be transmitted, the transmitter generates an r-bit sequence,
known as Frame Check Sequence (FCS) so that the (k+r) bits are actually being
transmitted. Now this r-bit FCS is generated by dividing the original number, appended by r
zeros, by a predetermined number. This number, which is (r+1) bit in length, can also be
considered as the coefficients of a polynomial, called Generator Polynomial.
The remainder of this division process generates the r-bit FCS. On receiving receiving
the packet, the receiver divides the (k+r) bit frame by the same predetermined number and if
it produces no remainder, it can be assumed that no error has occurred during the
transmission. Operations at both the sender and receiver end are shown in Fig.
18
Figure: 3.6 Basic scheme for Cyclic Redundancy Checking
19
This mathematical operation performed is illustrated in Fig. 3.2.7 by dividing a sample
4- bit number by the coefficient of the generator polynomial x3+x+1, which is 1011, using
the modulo-2 arithmetic. Modulo-2 arithmetic is a binary addition process without any carry
over, which is just the Exclusive-OR operation. Consider the case where k=1101. Hence we
have to divide 1101000 (i.e. k appended by 3 zeros) by 1011, which produces the remainder
r=001, so that the bit frame (k+r) =1101001 is actually being transmitted through the
communication channel. At the receiving end, if the received number, i.e., 1101001 is divided
by the same generator polynomial 1011 to get the remainder as 000, it can be assumed that
the data is free of errors.
20
21
CRC process can be expressed as XnM(X)/P(X) = Q(X) + R(X) / P(X) Commonly used
divisor polynomials are:
• CRC-16 = X16 + X15 + X2 + 1
• CRC-CCITT = X16 + X12 + X5 + 1
• CRC-32 = X32 + X26 + X23 + X22 + X16 + X12 + X11 + X10 + X8 + X7 + X5+ X4 +
X2 + 1
Performance
CRC is a very effective error detection technique. If the divisor is chosen according to the
previously mentioned rules, its performance can be summarized as follows:
• CRC can detect all single-bit errors
• CRC can detect all double-bit errors (three 1’s)
• CRC can detect any odd number of errors (X+1)
• CRC can detect all burst errors of less than the degree of the polynomial.
• CRC detects most of the larger burst errors with a high probability.
• For example CRC-12 detects 99.97% of errors with a length 12 or more.
22
CYCLIC REDUNDANCY CHECK
4.1 Introduction
Cyclic redundancy check is widely used in data communication and storage devices
as a powerful method for dealing with data errors. One of the most established hardware
solutions for CRC calculation is the linear feedback shift register and logic gates. This simple
architecture processes the bits serially. In situations, such as high speed data communications
the speed of this serial implementation is obviously inadequate. In these cases, parallel
computation of the CRC is desirable. Parallel CRC calculation can significantly increase the
throughput of CRC computations.
For example, the throughput of the 32-bit parallel calculation of CRC-32 can
achieve several gigabits per second. However, that is still not enough for high speed
application such as Ethernet networks. A possible solution is to process more bits in parallel;
Variants of CRCs are used in applications like CRC-16 BISYNC protocols, CRC32 in
Ethernet frame for error detection, CRC8 in ATM, CRC-CCITT in X-25 protocol, disc
storage, SDLC, and XMODEM.
23
4.3 Introduction to CRC
Some systems go to great lengths to detect data errors. Parity is often used with
parallel forms of data, on buses or memories, to detect some errors. It provides a small
measure of robustness by detecting certain bit errors with minimal redundancy. However,
while parity can detect single-bit errors, it can detect only half of all multiple-bit errors.
Other systems go further, employing Hamming codes to not only detect, but in
many instances correct, bit errors. Both of these approaches are applied to data in its parallel
form. Unfortunately, the use of a Hamming code requires many more bits of redundancy, per
character or word, than parity. For transmission of data on high-speed serial channels, the
most prevalent errors are multi-bit bursts. These multi-bit errors make parity worthless, and
severely limit the effectiveness of single-bit correcting Hamming codes.
The large amount of redundancy in a Hamming code (7 bits to protect a 32-bit
word) also makes it a poor choice to protect data across a serial link. Transmission of the
redundant bits in each word can easily consume a fifth of the available link bandwidth, or
require operation of the link at a 20% faster transfer rate to carry the redundant bits.
In reality, bit errors of any type are quite rare in these links (<< 1 in 1012 bits). Since these
errors cannot generally be corrected by a Hamming code or detected by character parity, the
transmission overhead of these types of detection/correction bits becomes a poor use of link
bandwidth. In systems where data is sent serially across a link, the data integrity of the link
can be much better verified using Cyclic Redundancy Check (CRC) codes.
The CRC algorithm can always be implemented as a software algorithm on a
standard CPU, the software solution will be cheap or free in terms of hardware cost. The
drawback is obviously the computational speed. Linear feedback shift register (LFSR) with
serial data feed has been used to implement the CRC algorithm. Like other hardware
implementations, this method simply performs a division and then the remainder which is the
resulting CRC checksum, is stored in the registers after each clock cycle. Simplicity and low
power dissipation are the main advantages. This method gives much higher throughput than
the software solution but still this cannot fulfill the speed requirements of today’s
24
communication protocol. In order to improve the computational speed of CRC, parallelism
has been introduced, but still this cannot fulfill the speed requirements of today’s
communication protocol.
25
cases of the General CRC Generator block and General CRC Syndrome Detector block,
which use a predefined CRC-N polynomial, where N is the number of bits in the checksum..
Fig.4.1 shows a long division example. In the example, the divisor is equal to
“11011,” whereas the dividend is equal to “1000111011000.” The long division process
begins by placing the 5 bits of the divisor below the 5 most significant bits of the dividend.
The next step in the long division process is to find how many times the divisor “11011”
“goes” into the 5 most significant bits of the dividend “10001”.
26
remainder that results from such long division process is often called CRC or CRC
“checksum” (although CRC is not literally a checksum).
27
● Most burst errors greater than the degree of the polynomial used.
Figure1.3 shows the circuit for the CRC-16 polynomial
The generator polynomial for CRC–16 is listed in Equation 1, and the polynomial for
CRC–32 is listed in Equation 2. These CRC codes are traditionally calculated on the serial
data stream using a Linear Feedback Shift Registers (LFSR) built from flip-flops and XOR
gates, as shown in Figure 1.3. The structure for the CRC–32 polynomial is shown in Figure
4.4
Equation 1: G(x) = x16+x15+x+1
Equation 2: G(x)=x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1
In these equations, the superscripts identify the tap location in the shift register. The order of
the polynomial is identified by the highest order term, and specifies the number of flip-flops
in the shift register. Since these polynomials are for modulo-2.
The generator polynomial for CRC-32 is as follows
G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 +x10 +x8+x7+ x5+ x4 + x2 + x + 1;
We can extract the coefficients of G(x) and represent it in binary form as
P = {p 32, p31, …………, p0}
P = {100000100110000010001110110110111}
Figure1.4 shows the circuit for the CRC-32 polynomial
28
Figure 4.4: CRC circuit for CRC-32.
Frame Check sequence (FCS) will be generated after (k+m) cycle, where k indicates
number of data bit and m indicates the order of generator polynomial. For 32 bits serial CRC
if order of generator polynomial is 32 then serial CRC will be generated after 64 cycles.
29
Figure 4.5: Parallel CRC structure
This is a parallel CRC block. The next state CRC output is a function of the current state
CRC and the data.
There are different techniques for parallel CRC generation given as follow.
● A Table-Based Algorithm for Pipelined CRC Calculation.
● Fast CRC Update
● F matrix based parallel CRC generation.
● Unfolding, Retiming and pipelining Algorithm
The following paragraphs and tables describe how the CRC–16 polynomial is converted to
calculate eight bits at a time (i.e., a byte basis). The CRC–32 polynomial is converted using a
similar procedure, with the results calculated 16 bits at a time (on a half-word basis). The
results for CRC–32 are presented in Table 5 and Table 6, but without the intermediate
calculations.
4.8 Implementation
1. Ri is the ith bit of the CRC register.
2. Ci is the contents of ith bit of the initial CRC register, before any shifts have taken place.
3. R1 is the least significant bit (LSB).
4. The entries under each CRC register bit indicate the values to be XORed together to
generate the content of that bit in the CRC register.
5. Di is the data input, with LSB input first.
6. D8 is the MSB of the input byte, and D1 is the LSB.
7. A substitution is made to reduce the table size, such that Xi = Di XOR Ci.
The results of the CRC are calculated one bit at a time and the resulting equations for
each bit are examined. The CRC register prior to any shifts is shown in Table 4.2. The CRC
register after a single bit shift is shown in Table 4.3. The CRC register after two shifts is
shown in Table 4.4.
30
This process continues until eight shifts have occurred. Table 4 lists the CRC register
contents after eight shifts. Xi was substituted for the various Di XOR Ci combinations. The
following properties were used to simplify the equations:
1.Commutative property (A XOR B = B XOR A).
2.Associative property (A XOR B XOR C = A XOR C XOR B).
3. Involution property (A XOR A = 0).
A study of Table 1.5 reveals two interesting facts:
The most-significant byte (bits R16–R9) of the CRC register is only dependent on
XOR combinations of the initial low-order byte of the CRC register and the input byte.
The least-significant byte (bits R8–R1) of the CRC register is dependent on the XOR
combination of the initial lower eight bits of the CRC register, the input data byte, and the
initial contents of the high-order bits of the CRC register.
This allows the next value of the CRC register to be calculated as an XOR of the input
data character bits, and a constant determined by the present contents of the CRC register. For
example, calculating a new value for R9 is accomplished by calculating X3 and X2 and
exclusive-ORing them together.
31
Table 4.4: CRC–16 Register after Two Shifts
32
The differences here are that data is now handled 16 bits at a time, the CRC register
is now 32 bits in length, and a different polynomial is used. Table 4.6 contains the XOR
information for the LSHW of the CRC–32 register after 16 shifts, and Table 4.7 contains the
XOR information for the MSHW of the CRC–32 register after 16 shifts. Again, note that the
MSHW only depends on XOR combinations of the initial lower-order bits of the CRC–32
register and the input data. The LSHW depends on XOR combinations of the initial
lower-order bits of the CRC–32 register, the input data, and the initial MSHW of the CRC–32
register.
PARALLEL ARCHITECTURE
There are different techniques for parallel CRC generation given as follow.
1. A Table-Based Algorithm for Pipelined CRC Calculation.
2. Fast CRC Update
3. F matrix based parallel CRC generation.
4. Unfolding, Retiming and pipelining Algorithm.
33
Figure 5.1: LUT
based architecture
34
Figure 5.3: Algorithms for F matrix based architecture
Parallel data input and each element of F matrix, which is generated from given
generator polynomial is ANDed, result of that will XORing with present state of CRC
checksum. The final result generated after (k+ m) /w cycle.
F Matrix Generation
F matrix is generated from generator polynomial as per
Where, {p0……pm-1} is generator polynomial. For example, the generator polynomial for
CRC4 is {1, 0, 0, 1, 1} and w bits are parallely processed
35
As indicated above, having F4 available, a power of F of lower order is immediately obtained.
So, for example:
The same procedure may be applied to derive equations for this parallel version. In this case,
the matrix G is G = P’ and becomes:
x-1= FW ⊕ (X ⊕ D)
36
Finally, equation as
X’= FW ⊗ X ⊕ d
If w bits are parallel processed, then CRC will be generated after (k +m)/w cycles
This equation can be expanded for crc4 given below.
X3'=X2 ⊕ X1 ⊕ X0 ⊕ d3
X2'=X3 ⊕ X2 ⊕ d2
X1'=X3 ⊕ X2 ⊕ X1 ⊕ d1
X0'=X3 ⊕ X2 ⊕ X1 ⊕ X0 ⊕d0
Fig.3.4 demonstrates an example of parallel CRC calculation with multiple input bits w = m = 4. The
dividend is divided into three 4-bit fields, acting as the parallel input vectors D(0),D(1),D(2),
respectively. The initial state is X(0) = [0 0 0 0]T .
37
Figure 5.4: Parallel calculation of CRC-32 for 32bit
Property of the FW matrix and the previously mentioned fact that Equation X’= FW ⊗ X ⊕ d
can be regarded as a recursive calculation of the next state X’ by matrix FW, current state X
and parallel input D, make the 32-bit parallel input vector suitable for any length of messages
besides the multiple of 32 bits. Remember that the length of the message is byte based. If the
length of message is not the multiple of 32, after a sequence of 32-bit parallel calculation, the
final remaining number of bits of the message could be 8; 16, or 24. For all these situations,
an additional parallel calculation w = 8; 16; 24 is needed by choosing the corresponding FW.
Since FW can be easily derived from F32, the calculation can be performed using Equation
X’= FW ⊗ X ⊕ d within the same circuit as 32- bit parallel calculation, the only difference is
the FW matrix.
If the length of the message is not the multiple of the number of parallel processing
bits w = 4 i.e. data bit is 11011101011. Then last two more bits (D (3)) need to be calculated
after getting X(12). Therefore, F2 must be obtained from matrix F4, and the extra two bits are
stored at the lower significant bits of the input vector D. Equation X’= FW ⊗ X ⊕ d can then be
applied to calculate the final state X(14), which is the CRC code. Therefore, only an extra
cycle is needed for calculating the extra bits if the data message length is not the multiple of
w, the number of parallel processing bits. It is worth to notice that in CRC-32 algorithm, the
initial state of the shift registers is preset to all ‘1’s.
Therefore, X(0) = 0xFFFF. However, the initial state X(0) does not affect the
correctness of the design. In order for better understanding, the initial state X(0) is still set to
0x0000 when the circuit is implemented.
38
In proposed architecture w= 64 bits are parallel processed and order of generator
polynomial is m= 32. As discussed in section 3, if 32 bits are processed parallel then CRC-32
will be generated after (k +m)/w cycles. If we increase number of bits to be processed
parallel, number of cycles required to calculate CRC can be reduced.
Proposed architecture can be realized by below equation.
Xtemp = FW ⊗ D(0to31)⊕ D(32to63)
X' = FW ⊗X⊕Xtemp (11)
Where,
D (0 to 31) =first 32 bits of parallel data input
D (0 to 63) = next 32 bits of parallel data input
X’=next state
X=present state
39
6.1 Introduction
A field-programmable gate array (FPGA) is an integrated circuit created to be
configured by the customer after manufacturing—hence "field-programmable". The FPGA
configuration is generally defined using a hardware description language (HDL), similar to
that used for an application-specific integrated circuit (ASIC) (circuit diagrams were
previously used to specify the configuration, as they were for ASICs, but this is increasingly
rare). FPGAs can be used to implement any logical function that an ASIC can perform. The
ability to update the functionality after shipping, partial re-configuration of the portion of the
design and the low non-recurring engineering costs relative to an ASIC design, offer
advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a
hierarchy of reconfigurable interconnects that allow the blocks to be "connected
together"—somewhat like a one-chip programmable breadboard. Logic blocks can be
configured to perform complex combinational functions, or merely simple logic like AND
and NAND. In most FPGAs, the logic blocks also include memory elements, which may be
simple flip-flops or more complete blocks of memory.
.
6.2 Architecture
The most common FPGA architecture consists of an array of logic blocks (called
Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on vendor), I/O
pads, and routing channels. Generally, all the routing channels have the same width (number
of wires). Multiple I/O pads may fit into the height of one row or the width of one column in
the array.
40
In general, a logic block (CLB or LAB) consists of a few logical cells. A typical cell
consists of a 4-input Lookup table (LUT), a Full adder (FA) and a D-type flip-flop, as shown.
The LUT are in this figure split into two 3-input LUTs. In normal mode those are
combined into a 4-input LUT through the left mux. In arithmetic mode, their outputs are fed
to the FA. In practice, entire or parts of the FA are put as functions into the LUTs in order to
save space.
6.4 Applications
Applications of FPGAs include digital signal processing, software-defined radio,
aerospace and defense systems, ASIC prototyping, medical imaging, cryptography,
bioinformatics computer hardware emulation, radio astronomy, metal detection etc. The
inherent parallelism of the logic resources on an FPGA allows for considerable computational
throughput even at a low MHz clock rates.
The flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel arithmetic units.
This has driven a new type of processing called reconfigurable computing, where time
intensive tasks are offloaded from software to FPGAs.
41
FPGAs have been reserved for specific vertical applications where the volume of
production is small. For these low-volume applications, the premium that companies pay in
hardware costs per unit for a programmable chip is more affordable than the development
resources spent on creating an ASIC for a low-volume application. Today, new cost and
performance dynamics have broadened the range of viable applications
42
PROPOSED METHOD
In proposed method, a unique way of implementing multiple bit error detection and
single bit error correction using CRC for a frame width of 24 bits and 32 bits. Let Ftr be the
frame transmitted in which the checksum is happened after 16 or 8 bits of data. We
can express Ftr as shown in Equation(1)
Ftr=Dtr&Ctr---------------------(1)
At the receiver side, let Fre be the received frame as shown in Equation 2.
Receiver again calculates CRC on the received data. Let Ccal indicates the CRC
calculated over Dre at the receiver side. If no error has occurred during transmission then Cre
and Ccal are equal. But if some bit(s) are in error, then Cre and Ccal will be in
mismatch. In such cases the error needs to be detected and corrected .Hence we
need to calculate the syndrome which is given by:
43
The syndrome is by using equation 4, the syndrome can be calculated in
minimum number of clock cycles .This method of syndrome calculation was proposed in
and is an efficient one, hence we have adopted it
In this method uses two standards CRC-16 and CRC-8 for correcting single bit
errors. Here we have designed the algorithm to detect more than one error and we will raise a
flag indicating retransmission of the frame if there were more than one error.
There are 2 cases in which the single bit error can occur. In the first case, one bit error
can occur in the data bits. In the second case, one bit error can be in checksum bits. Hence
the total number of possible single bit errors in data for both the standards is shown in
Table8.1
For CRC-16 if ‘i’ is the position of the error then for case 1)1≤ i ≤16 and
for case 2) 17≤ i ≤32 . Similarly for CRC-8 only the limits will change.
In proposed method we have used a new technique for detecting errors in both
the cases .This method is an outcome of the fact that if an error occurs as in case 2 then the
syndrome pattern will have1’s equal to no. of bits in error. As we know, we only have to
correct the error in the data bits, there is no need to correct the error which has occurred in the
checksum bits. But we need to detect the errors in the checksum bits. This method of analysis
is advantages since it reduces memory requirements by 76% and also we need to worry only
about the errors in data, this reduces the load on the computational block of the receiver.
To explain the method of correcting single bit error that we have proposed let us
consider an example of (7,4) CRC code. The syndrome generator circuit for this codeword.
This circuit of shift registers is similar to the circuit shown in Figure 1 except that it has only
three shift registers corresponding to three check bits in the codeword.
44
Figure 7.1: syndrome circuit for (7,4)
Here the generator polynomial used is g(x)=1+x+x3. The received vector Z=1110101
with three check bits and a nibble of data. In this codeword first three MSB bits are
check bits which is concatenated with a nibble of data. Here the 3rd MSB or 3rd Check bit
is in error.So we need to detect it and correct it. In conventional method the output of the
circuit in Figure 2 which is syndrome is used to address a look up table which actually stores
the error pattern. This error pattern is then XORed with the CRC frame to get the correct
data.
But, in the proposed method the same circuit in Figure 2 is used but more
number of clock cycles are required. That is when all the 7 received bits are entered
into the syndrome calculator, ‘0’s are now fed into it, from 8th shift onwards as shown Table
4. Each time a ‘0’ is fed into the circuit, the shift register contents are tabulated. This
process of feeding ‘0’s continues till the shift register contents read S0 S1 S2 =100. In
general for (n-k) shift register, the contents should read S0, S1……. Sn-k-1 = 1 0
0………….0. i.e., 1 followed by (n-k-1) number of 0s. In Table 7.2, we find that at the 12th
shift we get shift register contents as 100. The error is then located and corrected as given
45
.
As we can see from Table 7.2 after 7th shift we get the syndrome from the
circuit in “fig 2”. Since this syndrome is not equal to “000” it indicates an error. Then the
procedure is continued as explained earlier until 12th shift when shift register content is
“100”. This shift number indicates the position of the error as shown below.
The 5th bit counting from right is in error. Therefore Error pattern is E=0010000
46
V = 1100101
This is the same method employed in correcting single bit error in CRC-8 and CRC-16.
47
Figure 7.2: VLSI Architecture
48
7.3 State diagram of control unit:
The Timing and Control Unit is designed using a state diagram as shown in “fig
4”. It consists of four states, initially in So we need to compute the syndrome. Hence we
need a delay of 32 clock cycles and received bits must be routed to syndrome calculator
circuit. This is achieved by setting the mode pin to high and Sel1 pin to low which in-turn
connects clk input to the 5 bit counter and the CRC frame to the syndrome circuit
respectively. So when the Top signal of the counter becomes 1, it indicates the completion
of the delay required. Then the state transition to S1 occurs where mode pin turned
low so that the counter can count the number of1’s in the syndrome vector. The counter
content C is sent back to the control unit. If true NbE signal goes high. State jumps to S2 if
C=1 or NbE=1and the select pin Sel2 is made low which makes the mux connect received
data to corrected data. If C≠1 and NbE=0 then the control jumps from S1 to S3 which
indicates that a single bit error is present in data bits. In S3 the control unit sets the Sel1 to
high which inputs 0’s into the syndrome calculator. After each clock cycle syndrome is
XORed with “10000…..0” and checked if zero using the same Generic NOR gate. If
true SbE goes high indicating the completion of the process, if false then error pattern
register is shifted towards left as explained in previous section.
49
The control unit stays in S3 until SbE is 0 . Once SbE is 1 then Sel2 is set as 0 to
connect the output corrected data to a pattern obtained by XORing received data with content
of error pattern register.
INTRODUCTION TO XILINX
8.1 Introduction
Xilinx is a powerful software tool that is used to design, synthesize, simulate, test and
verify digital circuit designs. The designer (you in this case) can describe the digital design
by either using the schematic entry tool or a hardware description language. In this tutorial,
we will create VHDL design input files – the hardware description of the logic circuit,
compile VHDL source files, create a test bench and simulate the design to make sure of the
correct operation of the design (functional simulation). The purpose of this tutorial is to give
new users an exposure to the basic and necessary steps to implement and examine your own
designs using ISE environment. In this tutorial, we will design one simple module (OR gate);
however, in the future, you will be designing such modules and completing the overall circuit
design from these existing files.
As you will learn (or have learned) in this course, there are different styles for the architecture
body:
● Behavioral – set of sequential assignment statements
● Data Flow – set of concurrent assignments
● Structural – set of interconnected components
A combination of these could be used, but in this tutorial we will use Dataflow. In its simplest
form, the architectural body will take the following format, regardless of the style:
architecture
50
architecture_name of entity_name is
begin
… -- statement
end architecture_name;
51
Figure 8.1: Xilinx Project Navigator window
52
4. Workspace – used to view and edit source files, multiple files can be opened
simultaneously and the name of each file will be shown in a separate tab in the bottom of
workspace window to enable you to switch between different files
53
new
2. In the “Name” field enter a short name for your project that correctly describes what you
are designing (For now we will use “ORgate”). Also, make sure that your project name:
● Starts with a letter
● Contains only alphanumeric characters and underscores
● Cannot contain two consecutive underscores.
3. Click the Browse icon (pointed by the arrow in the Figure above) in order to select the
desired location to which you would like to save your project.
54
4. In the “Top-level source type” field, make sure that HDL is selected – this is selected if
the top level design to be used is in VHDL or Verilog, which can include lower level modules
such as HDL files, schematics or different types.
5. Click “Nex
6. In the “Project Settings” page shown below, ensure that the following options are set
because they effect the types and processes that will be available for your design:
● “Product Category” All
● “Family” Spartan3E
● “Device” XC3S500E
● “Package” FT256
● “Speed” -5
● “Top-Level Source Type” HDL (automatically selected)
● “Synthesis Tool” XST (VHDL/Verilog), which is a technology to synthesize VHDL,
Verilog, or mixed language designs to create “Xilinx-specific netlist” files.
● “Simulator” ISim (VHDL/Verilog), allows for running integrated simulation process
as part of your ISE design flow.
● “Preferred Language” VHDL
7. Leave the remaining fields as their default settings.
8. Click “Next” and you will be presented with a summary of your new project as shown in
Figure
In order to open an existing project in Xilinx, select File Open Project to show the
lists of projects available in a certain directory, choose the project you want and check “OK”
9. Click “Finish” and you will exit the “New Project Wizard” and be taken back to the
original “ISE Project Navigator” window, but a new project hierarchy is generated with the
“ORgate” design file displayed in the “Hierarchy Pane” as shown in Figure;
55
● Click on the “New Source” icon, which is to the left of the “Hierarchy Pane.”
This can also be done by right clicking on “ORgate” source file in the “Hierarchy
Pane” and clicking “New Source,” as shown in Figure 7. This will take us to the
“New Source Wizard” as shown Figure 8.
56
Figure 8.4: New Source Wizard
● Select “VHDL Module” as a source file type to be added to the project since our
files will contain VHDL design code, so our files will have “.vhd” extension.
● In the “File name” field, enter a name of the entity for which you are creating
input and output ports for. Remember to follow the conventions mentioned earlier
(in Section 4, step 2) for naming the project. In this case, enter “ORgate”.
● For the “Location” field, click the browse icon to navigate to the appropriate
folder, which should be the same one used for creating the project.
● Make sure that the “Add to project” checkbox is selected to automatically add
this source to your project so that you don’t need to add it to the project again
manually.
● Click “Next”, the wizard will take you to the “Define Module” page as shown
below, where I/O of the module (OR gate) will get defined. As you can see, the
entity name is there, but can be changed if you want and the architecture name is
“Behavioral” by default.
● “Direction” field is used to describe the mode, which is how data is transferred
through the port. We are concerned with 3 modes: in – data flowing into the port;
out – data flowing out of the port; inout – data flowing into and out of the port
(bi-directional). Since we have 2 inputs and 1 output, in the first 3 fields under
port name, we type “a”, “b” and “c” and set the “Direction” fields as “in” for the
first two fields and “out” for the third field (c).
● Click “Next” to view and verify the summary of the information about the new
source created. If any changes are to be made, just click cancel.
● After making sure that the description of the module is correct, click “Finish.”
The source file will be now displayed in “ISE Project Navigator” as shown
below; the workspace window will be used as a text editor to make necessary
changes to the source file. All the input and output ports that we specified will be
displayed.
57
8.5 Synthesizing VHDL Code
The design has to be synthesized before it can be checked for correctness by running
functional simulation. XST will analyze the VHDL code and try to gather building blocks in
order to create efficient implementation by performing resource sharing to reducing area
while increasing clock frequency. In other words, synthesis will convert the code into digital
circuit by transforming it into a net list of gates.
● Make sure that “Implementation” checkbox is checked from the “View Pane” in the
“Design” Panel.
● From the “Process Pane” in the “Design” Panel, double click on the “Synthesize –
XST” function as shown in Figure 13, which will check the syntax of your code and
give you warning and error messages if any are present in the “Transcript Window”,
where you can click “Errors” or “Warnings” tab. Errors are indicated by next to the
message and warnings are indicated as shown. You can right click the message and
select “Search for Answer Record” to open the Xilinx website and show any related
answer. Otherwise, you can just right click the message and select “Go to Source” to
go directly to the error. These errors must be corrected, saved, and fresh synthesis
(compilation) needs to be done again before you move to the next step; otherwise,
you won’t be able to simulate your design. After correcting the errors (if any), the
synthesis process runs without errors and displays to the left of “Synthesis – XST”
● After a successful Synthesis, you will get a message as shown in Figure
58
● In the “File name” field choose a name that signifies the test bench and adheres to
the naming conventions mentioned earlier. Type “testorgate”
● For the “Location” field, click the browse icon to navigate to the appropriate
folder, which should be the same one used for creating the project.
● Click “Next”
● The following window allows you to select which design you want to create a test
bench for, in our case “ORgate” since it is the only module we have; however, for
your future designs, you can make test benches for individual components of your
designs as well as the top-level design which ties it all together
● Click “Next”
● A summary window like the one shown below will appear, click “Finish”
● Now you will view the test bench file (testorgate.vhd), shown below, that Xilinx
has generated in the workspace window.
● Let’s modify the default code by removing the highlighted code shown below,
which is the clock process that is generated by default, which divides the clock
period by two. We also want to remove the stimulus process.
● Replace the deleted code with the following code segment, which will perform a
very simple initial test of the design for simulation by giving different values of
inputs:
● The test bench file does not appear in the “Hierarchy” Pane of the “Design”
Panel. This is because there is a separate view for implementation and test files. In
order to view test files, select the box of “Simulation” in the “View Pane” of the
“Design” panel. In the “Process Pane,” double click on the “Behavioral Check
Syntax” to make sure that you didn’t make any syntax errors while making
changes.
● Save your work.
● Double click on “Simulate Behavioral Model” in the “Process Pane”, which
will open the ISim software with your test bench loaded.
● ISim simulator window will open with your simulation executed, as shown in
Figure 22, where you are able to simulate your designs and check for errors. You
can step through your VHDL designs and check the states of signals and set the
simulation to run for specific period of time. Make sure to check the results of the
59
simulation output against your truth table results to verify the correctness of the
design. The resolution of the simulation is set to 1 picosecond to ensure correct
processing of your design.
● To get a better view of the simulation waveforms, from the tool bar, click on View
Zoom Full View or use F6 or click on the shortcut “Zoom to Full View”
iconThis will give you a better view of what your simulation is doing
● In the text box located near the run button, you may specify amount of time for
the simulation to run; the button to the left of the box will execute the simulation
for the time you have specified. After setting the new simulation time, click on
Re-Start to clear the previous simulation result and then click on Run to
start simulating with new time setting. Below is an example of 2us of simulation
time
60
● You can change the default simulation run time. This will help to avoid setting the run
time again every time you launch the simulation. This can be done by setting the
properties of your project in Xilinx.
● Right click on Simulate Behavioral Model, and then click on Process Properties…
An ISim property window will appear. You can modify the simulation run time from
value textbox as in Figure
● In some cases, you want to change the display format of a specific signal from binary
format into other format. This can be done by doing a right click on that signal, then
click on Radix and choose your desired display format. Below is an example of
changing display format from binary number to hexadecimal number:
● When your design is big, it is not easy to just look at into your HDL code to find the
mistake. In this case, you may want to see the internal signals of a specific component
to see if it is working properly or not. To do this, you will need to open both panels
Instances and Processes and Objects
61
CONCLUSION
CRC is the method that can detect errors in transferring data between two points.
In this method CRC has been used for single bit error correction. We have implemented
CRC-16 Hardware implementation on FPGA can be effectively used to improve the
performance of CRC calculations.
Components like shift register, XOR and NOR gates are used, so the entire circuit
can be easily designed. This approach is efficient in terms of hardware and speed.
62
REFERENCES
[1] Campobello, G.; Patane, G.; Russo, M.; "Parallel CRC realization," Computers, IEEE
Transactions on , vol.52, no.10, pp. 1312- 1319, Oct.2003
[2] Albertengo, G.; Sisto, R.; , "Parallel CRC generation," Micro, IEEE, vol.10, no.5,
pp.63-71,Oct1990
[3] M.D.Shieh et al., “A Systematic Approach for Parallel CRC Computations,” Journal of
Information Science and Engineering,
May 2001.
[4] Braun, F.; Waldvogel, M.; , "Fast incremental CRC updates for IP over ATM networks,"
High Performance Switching and Routing, 2001 IEEE Workshop on , vol., no., pp.48-52,
2001
[5] Weidong Lu and Stephan Wong, “A Fast CRC Update Implementation”, IEEE Workshop
on High Performance
Switching and Routing ,pp. 113-120, Oct. 2003.
[6] S.R. Ruckmani, P.Anbalagan, “High Speed cyclic Redundancy Check for USB” Research
Scholar, Department of Electrical
Engineering, Coimbatore Institute of Technology, Coimbatore- 641014, DSP Journal, Volume
6, Issue 1, September, 2006.
[7] Yan Sun; Min Sik Kim; , "A Pipelined CRC Calculation Using Lookup Tables,"
Consumer Communications and Networking
Conference (CCNC), 2010 7th IEEE , vol., no., pp.1-2, 9-12 Jan. 2010
63
[8] Sprachmann, M.; , "Automatic generation of parallel CRC circuits," Design & Test of
Computers, IEEE , vol.18, no.3,pp.108-114, May 2001 583 588 585
64